?otorola inc., 1990 revised 1992, 1993 m68040 user? manual including the mc68040, mc68040v, mc68lc040, mc68ec040, and mc68ec040v motorola f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
?motorola inc., 1992 motorola reserves the right to make changes without further notice to any products herein to improve reliability, function or design. motorola does not assume any liability arising out of the application or use of any product or circuit described herein; neither does it convey any license under its patent rights nor the rights of others. motorola products are not designed, intended, or authorized for use as components in systems intended for surgical implant into the body, or other applications intended to support or sustain life, or for any other application in which the failure of the motorola product could create a situation where personal injury or death may occur. should buyer purchase or use motorola products for any such unintended or unauthorized application, buyer shall indemnify and hold motorola and its officers, employees, subsidiaries, affiliates, and distributors harmless against all claims, costs, damages, and expenses, and reasonable attorney fees arising out of, directly or indirectly, any claim of personal injury or death associated with such unintended or unauthorized use, even if such claim alleges that motorola was negligent regarding the design or manufacture of the part. motorola and the are registered trademarks of motorola, inc. motorola, inc. is an equal opportunity/affirmative action employer.
f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
iv m68040 user? manual motorola preface the complete documentation package for the mc68040, mc68040v, mc68lc040, mc68ec040, and mc68ec040v (collectively called m68040) consists of the m68040um/ad, m68040 user? manual , and the m68000pm/ad, m68000 family programmer? reference manual . the m68040 user? manual describes the capabilities, operation, and programming of the m68040 32-bit third-generation microprocessors. the m68000 family programmer? reference manual contains the complete instruction set for the m68000 family. the introduction of this manual includes general information concerning the mc68040 and summarizes the differences between the m68040 member devices. additionally, three appendices provide detailed information on how these m68040 dirivatives operate differently from the mc68040. for detailed information on one of these m68040 dirivatives, use the following table to determine which appendices to read in conjunction with the rest of this manual. device number appendices mc68040v appendix a mc68lc040 and appendix c mc68040v and mc68ec040v mc68lc040 appendix a mc68lc040 mc68ec040 appendix b mc68ec040 mc68ec040v appendix b mc68ec040 and appendix c mc68040v and mc68ec040v when reading this manual, remember to disregard information concerning floating-point in reference to the mc68040v and mc68lc040, and to disregard information concerning floating-point and memory management in reference to the mc68ec040 and mc68ec040v. the organization of this manual is as follows: section 1 introduction section 2 integer unit section 3 memory management unit (except mc68ec040 and mc68ec040v) section 4 instruction and data caches section 5 signal description section 6 ieee 1149.1 test access port (jtag) section 7 bus operation section 8 exception processing section 9 floating-point unit (mc68040) section 10 instruction timings section 11 mc68040 electrical and thermal characteristics section 12 ordering information and mechanical data appendix a mc68lc040 appendix b mc68ec040 appendix c mc68040v and mc68ec040v appendix d m68000 family summary appendix e floating-point emulation (m68040fpsp) index f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
vi m68040 user? manual motorola table of contents paragraph page number title number section 1 introduction 1.1 differences ............................................................................................ 1-1 1.1.1 mc68040v and mc68lc040 ............................................................ 1-1 1.1.2 mc68ec040 and mc68ec040v ....................................................... 1-2 1.2 features ................................................................................................ 1-3 1.3 extensions to the m68000 family ......................................................... 1-3 1.4 functional blocks .................................................................................. 1-3 1.5 processing states ................................................................................. 1-5 1.6 programming model .............................................................................. 1-5 1.7 data format summary.......................................................................... 1-9 1.8 addressing capabilities summary ........................................................ 1-9 1.9 notational conventions ......................................................................... 1-11 1.10 instruction set overview ....................................................................... 1-13 section 2 integer unit 2.1 integer unit pipeline.............................................................................. 2-1 2.2 integer unit register description .......................................................... 2-4 2.2.1 integer unit user programming model .............................................. 2-4 2.2.1.1 data registers (d7?0) ................................................................ 2-4 2.2.1.2 address registers (a6?0) ........................................................... 2-4 2.2.1.3 system stack pointer (a7) ............................................................. 2-5 2.2.1.4 program counter ........................................................................... 2-5 2.2.1.5 condition code register ................................................................ 2-5 2.2.2 integer unit supervisor programming model .................................... 2-5 2.2.2.1 interrupt and master stack pointers .............................................. 2-6 2.2.2.2 status register .............................................................................. 2-7 2.2.2.3 vector base register ..................................................................... 2-7 2.2.2.4 alternate function code registers ................................................ 2-7 2.2.2.5 cache control register ................................................................. 2-8 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual vii table of contents (continued) paragraph page number title number section 3 memory management unit (except mc68ec040 and mc68ec040v) 3.1 memory management programming model .......................................... 3-3 3.1.1 user and supervisor root pointer registers................................ ..... 3-3 3.1.2 translation control register .............................................................. 3-4 3.1.3 transparent translation registers .................................................... 3-5 3.1.4 mmu status register ........................................................................ 3-6 3.2 logical address translation .................................................................. 3-7 3.2.1 translation tables ............................................................................. 3-7 3.2.2 descriptors ........................................................................................ 3-12 3.2.2.1 table descriptors ........................................................................... 3-12 3.2.2.2 page descriptors ........................................................................... 3-13 3.2.2.3 descriptor field definitions ............................................................ 3-13 3.2.3 translation table example ................................................................ 3-16 3.2.4 variations in translation table structure .......................................... 3-16 3.2.4.1 indirect action ................................................................................ 3-16 3.2.4.2 table sharing between tasks ....................................................... 3-18 3.2.4.3 table paging .................................................................................. 3-19 3.2.4.4 dynamically allocated tables ........................................................ 3-21 3.2.5 table search accesses ..................................................................... 3-21 3.2.6 address translation protection ......................................................... 3-23 3.2.6.1 supervisor and user translation tables........................................ 3-23 3.2.6.2 supervisor only.............................................................................. 3-23 3.2.6.3 write protect .................................................................................. 3-24 3.3 address translation caches ................................................................. 3-26 3.4 transparent translation ........................................................................ 3-29 3.5 address translation summary .............................................................. 3-30 3.6 mmu effect on rsti and mdis ............................................................. 3-31 3.6.1 effect of rsti on the mmus .............................................................. 3-31 3.6.2 effect of mdis on address translation .............................................. 3-31 3.7 mmu instructions .................................................................................. 3-33 3.7.1 movec ............................................................................................. 3-33 3.7.2 pflush............................................................................................. 3-33 3.7.3 ptest ............................................................................................... 3-33 3.7.4 register programming considerations.............................................. 3-34 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
viii m68040 user? manual motorola table of contents (continued) paragraph page number title number section 4 instruction and data caches 4.1 cache operation ................................................................................... 4-2 4.2 cache management.............................................................................. 4-5 4.3 caching modes ..................................................................................... 4-6 4.3.1 cachable accesses ........................................................................... 4-6 4.3.1.1 write-through mode ...................................................................... 4-6 4.3.1.2 copyback mode ............................................................................. 4-6 4.3.2 cache-inhibited accesses ................................................................. 4-7 4.3.3 special accesses .............................................................................. 4-7 4.4 cache protocol ..................................................................................... 4-7 4.4.1 read miss ......................................................................................... 4-8 4.4.2 write miss .......................................................................................... 4-8 4.4.3 read hit ............................................................................................ 4-8 4.4.4 write hit ............................................................................................. 4-8 4.5 cache coherency ................................................................................. 4-9 4.6 memory accesses for cache maintenance........................................... 4-11 4.6.1 cache filling...................................................................................... 4-11 4.6.2 cache pushes ................................................................................... 4-13 4.7 cache operation summary................................................................... 4-13 4.7.1 instruction cache............................................................................... 4-14 4.7.2 data cache........................................................................................ 4-15 section 5 signal description 5.1 address bus (a31?0) ......................................................................... 5-4 5.2 data bus (d31?0) ............................................................................... 5-5 5.3 transfer attribute signals...................................................................... 5-5 5.3.1 transfer type (tt1, tt0) .................................................................. 5-5 5.3.2 transfer modifier (tm2?m0) ........................................................... 5-6 5.3.3 transfer line number (tln1, tln0)................................................. 5-6 5.3.4 user-programmable attributes (upa1, upa0) .................................. 5-7 5.3.5 read/write (r/ w ) .............................................................................. 5-7 5.3.6 transfer size (siz1, siz0) ................................................................ 5-7 5.3.7 lock ( lock ) ...................................................................................... 5-7 5.3.8 lock end ( locke ) ............................................................................ 5-7 5.3.9 cache inhibit out ( ciout ) ................................................................ 5-8 5.4 bus transfer control signals ................................................................ 5-8 5.4.1 transfer start ( ts ) ............................................................................. 5-8 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual ix table of contents (continued) paragraph page number title number 5.4.2 transfer in progress ( tip ) ................................................................. 5-8 5.4.3 transfer acknowledge ( ta ) ............................................................... 5-8 5.4.4 transfer error acknowledge ( tea ) .................................................... 5-8 5.4.5 transfer cache inhibit ( tci ) .............................................................. 5-9 5.4.6 transfer burst inhibit ( tbi ) ................................................................. 5-9 5.5 snoop control signals........................................................................... 5-9 5.5.1 snoop control (sc1, sc0) ................................................................ 5-9 5.5.2 memory inhibit ( mi )............................................................................ 5-9 5.6 arbitration signals ................................................................................. 5-10 5.6.1 bus request ( br ) .............................................................................. 5-10 5.6.2 bus grant ( bg ) .................................................................................. 5-10 5.6.3 bus busy ( bb ).................................................................................... 5-10 5.7 processor control signals ..................................................................... 5-10 5.7.1 cache disable ( cdis )........................................................................ 5-10 5.7.2 reset in ( rsti ) .................................................................................. 5-11 5.7.3 reset out ( rsto ).............................................................................. 5-11 5.8 interrupt control signals........................................................................ 5-11 5.8.1 interrupt priority level ( ipl2 ipl0 ).................................................... 5-11 5.8.2 interrupt pending status ( ipend ) ...................................................... 5-12 5.8.3 autovector ( avec ) ............................................................................. 5-12 5.9 status and clock signals ...................................................................... 5-12 5.9.1 processor status (pst3?st0) ........................................................ 5-12 5.9.2 bus clock (bclk) .............................................................................. 5-14 5.9.3 processor clock (pclk)?ot on mc68040v and mc68ec040v ... 5-14 5.10 mmu disable ( mdis )?ot on mc68ec040 ......................................... 5-14 5.11 data latch enable (dle)?nly on mc68040...................................... 5-14 5.12 test signals .......................................................................................... 5-15 5.12.1 test clock (tck) ............................................................................... 5-15 5.12.2 test mode select (tms) .................................................................... 5-15 5.12.3 test data in (tdi) .............................................................................. 5-15 5.12.4 test data out (tdo) ......................................................................... 5-15 5.12.5 test reset ( trst )?ot on mc68040v and mc68ec040v............. 5-15 5.13 power supply connections ................................................................... 5-15 5.14 signal summary .................................................................................... 5-16 section 6 ieee 1149.1 test access port (jtag) 6.1 overview ................................ ............................................................... 6-2 6.2 instruction shift register ....................................................................... 6-3 6.2.1 extest ............................................................................................. 6-3 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
x m68040 user? manual motorola table of contents (continued) paragraph page number title number 6.2.2 highz ............................................................................................... 6-4 6.2.3 sample/preload.......................................................................... 6-4 6.2.4 drvctl.t ......................................................................................... 6-4 6.2.5 shutdown ..................................................................................... 6-5 6.2.6 private ........................................................................................... 6-5 6.2.7 drvctl.s......................................................................................... 6-5 6.2.8 bypass ............................................................................................ 6-6 6.3 boundary scan register ....................................................................... 6-6 6.4 restrictions ........................................................................................... 6-12 6.5 disabling the ieee standard 1149.1a operation ................................ 6-13 6.6 motorola m68040 bsdl description (version 2.2) ............................... 6-15 6.7 mc68040, mc68lc040, mc68ec040 jtag electrical characteristics .......................................................... 6-21 section 7 bus operation 7.1 bus characteristics ............................................................................... 7-1 7.2 data transfer mechanism..................................................................... 7-3 7.3 misaligned operands ............................................................................ 7-6 7.4 processor data transfers ..................................................................... 7-9 7.4.1 byte, word, and long-word read transfers .................................... 7-10 7.4.2 line read transfer ............................................................................ 7-12 7.4.3 byte, word, and long-word write transfers .................................... 7-20 7.4.4 line write transfers .......................................................................... 7-22 7.4.5 read-modify-write transfers (locked transfers) ............................. 7-26 7.5 acknowledge bus cycles ...................................................................... 7-29 7.5.1 interrupt acknowledge bus cycles .................................................... 7-29 7.5.1.1 interrupt acknowledge bus cycle (terminated normally) ............ 7-31 7.5.1.2 autovector interrupt acknowledge bus cycle ................................ 7-33 7.5.1.3 spurious interrupt acknowledge bus cycle................................... 7-34 7.5.2 breakpoint interrupt acknowledge bus cycle ....................................... 7-35 7.6 bus exception control cycles............................................................... 7-36 7.6.1 bus errors ......................................................................................... 7-37 7.6.2 retry operation ................................................................................. 7-41 7.6.3 double bus fault ............................................................................... 7-43 7.7 bus synchronization ............................................................................. 7-43 7.8 bus arbitration and examples .............................................................. 7-44 7.8.1 bus arbitration ................................................................................... 7-45 7.8.2 bus arbitration examples .................................................................. 7-52 7.8.2.1 dual m68040 fairness arbitration ................................................. 7-52 7.8.2.2 dual m68040 prioritized arbitration ............................................... 7-54 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual xi table of contents (continued) paragraph page number title number 7.8.2.3 m68040 synchronous dma arbitration .......................................... 7-55 7.8.2.4 m68040 asynchronous dma arbitration ........................................ 7-57 7.9 bus snooping operation ....................................................................... 7-59 7.9.1 snoop-inhibited cycle........................................................................ 7-60 7.9.2 snoop-enabled cycle (no intervention required) ............................ 7-61 7.9.3 snoop read cycle (intervention required) ....................................... 7-63 7.9.4 snoop write cycle (intervention required) ....................................... 7-63 7.10 reset operation .................................................................................... 7-65 7.11 special modes of operation .................................................................. 7-68 7.11.1 output buffer impedance selection ................................................... 7-68 7.11.2 multiplexed bus mode ....................................................................... 7-68 7.11.3 data latch enable mode ................................................................... 7-69 section 8 exception processing 8.1 exception processing overview ............................................................ 8-1 8.2 integer unit exceptions ......................................................................... 8-5 8.2.1 access fault exception ..................................................................... 8-6 8.2.2 address error exception.................................................................... 8-8 8.2.3 instruction trap exception ................................................................. 8-8 8.2.4 illegal instruction and unimplemented instruction exceptions .......... 8-9 8.2.5 privilege violation exception ............................................................. 8-9 8.2.6 trace exception ................................................................................. 8-10 8.2.7 format error exception ..................................................................... 8-11 8.2.8 breakpoint instruction exception ....................................................... 8-12 8.2.9 interrupt exception ............................................................................ 8-12 8.2.10 reset exception................................................................................. 8-17 8.3 exception priorities ............................................................................... 8-19 8.4 return from exceptions........................................................................ 8-20 8.4.1 four-word stack frame (format $0) ................................................ 8-21 8.4.2 four-word throwaway stack frame (format $1) ............................. 8-21 8.4.3 six-word stack frame (format $2) ................................................... 8-22 8.4.4 floating-point post-instruction stack frame (format $3) ................. 8-23 8.4.5 eight-word stack frame (format $4)................................................ 8-23 8.4.6 access error stack frame (format $7) ............................................. 8-24 8.4.6.1 effective address ........................................................................... 8-24 8.4.6.2 special status word (ssw) ........................................................... 8-24 8.4.6.3 write-back status .......................................................................... 8-26 8.4.6.4 fault address ................................................................................. 8-26 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
xii m68040 user? manual motorola table of contents (continued) paragraph page number title number 8.4.6.5 write-back address and write-back data ................................ ..... 8-26 8.4.6.6 push data ...................................................................................... 8-27 8.4.6.7 access error stack frame return from exception ....................... 8-27 section 9 floating-point unit (mc68040 only) 9.1 floating-point unit pipeline ................................................................... 9-1 9.2 floating-point user programming model .............................................. 9-2 9.2.1 floating-point data registers (fp7?p0) ......................................... 9-2 9.2.2 floating-point control register (fpcr) ............................................ 9-3 9.2.2.1 exception enable byte ................................................................... 9-3 9.2.2.2 mode control byte ......................................................................... 9-3 9.2.3 floating-point status register (fpsr) .............................................. 9-4 9.2.3.1 floating-point condition code byte............................................... 9-4 9.2.3.2 quotient byte ................................................................................. 9-5 9.2.3.3 exception status byte.................................................................... 9-5 9.2.3.4 accrued exception (aexc) byte. .................................................. 9-5 9.2.4 floating-point instruction address register (fpiar) ........................ 9-6 9.3 floating-point data formats and data types....................................... 9-7 9.4 computational accuracy ....................................................................... 9-11 9.4.1 intermediate result ........................................................................... 9-12 9.4.2 rounding the result .......................................................................... 9-13 9.5 postprocessing operation..................................................................... 9-15 9.5.1 underflow, round, overflow ............................................................. 9-16 9.5.2 conditional testing ............................................................................ 9-16 9.6 floating-point exceptions ..................................................................... 9-20 9.6.1 unimplemented floating-point instructions....................................... 9-20 9.6.2 unsupported floating-point data types ........................................... 9-22 9.7 floating-point arithmetic exceptions .................................................... 9-24 9.7.1 branch/set on unordered (bsun) .................................................... 9-25 9.7.1.1 maskable exception conditions..................................................... 9-26 9.7.1.2 nonmaskable exception conditions .............................................. 9-27 9.7.2 signaling not-a-number (snan)....................................................... 9-27 9.7.2.1 maskable exception conditions..................................................... 9-27 9.7.2.2 nonmaskable exception conditions .............................................. 9-27 9.7.3 operand error ................................................................................... 9-28 9.7.3.1 maskable exception conditions..................................................... 9-29 9.7.3.2 nonmaskable exception conditions .............................................. 9-30 9.7.4 overflow ............................................................................................ 9-31 9.7.4.1 maskable exception conditions..................................................... 9-31 9.7.4.2 nonmaskable exception conditions .............................................. 9-31 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual xiii table of contents (continued) paragraph page number title number 9.7.5 underflow .......................................................................................... 9-33 9.7.5.1 maskable exception conditions ..................................................... 9-34 9.7.5.2 nonmaskable exception conditions .............................................. 9-34 9.7.6 divide by zero.................................................................................... 9-36 9.7.7 inexact result .................................................................................... 9-36 9.8 floating-point state frames.................................................................. 9-39 section 10 instruction timings 10.1 overview ............................................................................................... 10-3 10.2 instruction timing examples ................................................................. 10-5 10.3 cinv and cpush instruction timing.................................................... 10-8 10.4 move instruction timing ...................................................................... 10-9 10.5 miscellaneous integer unit instruction timings................................ ..... 10-11 10.6 integer unit instruction timings ............................................................ 10-13 10.7 floating-point unit instruction timings ................................................. 10-29 10.7.1 miscellaneous integer unit support timings ................................ ..... 10-29 10.7.2 integer unit support timings ............................................................. 10-30 10.7.3 timings in the floating-point unit ...................................................... 10-35 section 11 mc68040 electrical and thermal characteristics 11.1 maximum ratings ................................................................................. 11-1 11.2 t hermal characteristics ........................................................................ 11-1 11.3 dc electrical specifications .................................................................. 11-2 11.4 power dissipation ................................................................................. 11-2 11.5 clock ac timing specifications ............................................................ 11-3 11.6 output ac timing specifications .......................................................... 11-4 11.7 input ac timing specifications ............................................................. 11-5 11.8 mc68040 thermal device characteristics............................................ 11-12 11.8.1 mc68040 die and package ............................................................... 11-12 11.8.2 mc68040 power considerations ....................................................... 11-12 11.9 mc68040 thermal management techniques ....................................... 11-14 11.9.1 still air................................................................................................ 11-17 11.9.2 forced air .......................................................................................... 11-18 11.9.3 with heat sink ................................................................................... 11-19 11.9.4 with heat sink and forced air .......................................................... 11-22 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
xiv m68040 user? manual motorola table of contents (continued) paragraph page number title number section 12 ordering information and mechanical data 12.1 ordering information ............................................................................. 12-1 12.2 pin assignments ................................................................................... 12-1 12.2.1 mc68040 pin grid array ................................................................... 12-2 12.2.2 mc68lc040 pin grid array............................................................... 12-3 12.2.3 mc68ec040 pin grid array .............................................................. 12-4 12.2.4 mc68040v and mc68ec040v pin grid array .................................. 12-5 12.2.5 mc68lc040 quad flat pack............................................................. 12-6 12.2.6 mc68ec040 quad flat pack ............................................................ 12-6 12.2.7 mc68040v and mc68ec040v quad flat pack................................ 12-7 12.3 mechanical data ................................................................................... 12-9 appendix a mc68lc040 a.1 mc68lc040 differences....................................................................... a-5 a.2 interrupt priority level ( ipl2 ipl0 ) ....................................................... a-5 a.3 jtag scan (js0) .................................................................................. a-5 a.4 data latch and multiplexed bus modes ............................................... a-5 a.5 floating-point unit (fpu) ...................................................................... a-5 a.5.1 unimplemented floating-point instructions and exceptions ............. a-6 a.5.2 mc68lc040 stack frames ............................................................... a-7 a.6 mc68lc040 electrical characteristics ................................................. a-7 a.6.1 maximum ratings .............................................................................. a-8 a.6.2 thermal characteristics .................................................................... a-8 a.6.3 dc electrical specifications .............................................................. a-8 a.6.4 power dissipation.............................................................................. a-9 a.6.5 clock ac timing specifications ........................................................ a-9 a.6.6 output ac timing specifications ....................................................... a-11 a.6.7 input ac timing specifications.......................................................... a-12 appendix b mc68ec040 b.1 mc68ec040 differences ...................................................................... b-4 b.2 jtag scan (js1?s0) .......................................................................... b-5 b.3 access control units............................................................................. b-5 b.3.1 access control registers .................................................................. b-5 b.3.2 address comparison ......................................................................... b-7 b.3.3 effect of rsti on the acu................................................................. b-8 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual xv table of contents (continued) paragraph page number title number b.4 special modes of operation ................................................................. b-8 b.5 exception processing............................................................................ b-10 b.5.1 unimplemented floating-point instructions and exceptions ............. b-10 b.5.2 mc68ec040 stack frames ............................................................... b-11 b.6 software considerations ....................................................................... b-12 b.7 mc68ec040 electrical characteristics ................................................. b-12 b.7.1 maximum ratings .............................................................................. b-12 b.7.2 thermal characteristics ..................................................................... b-12 b.7.3 dc electrical specifications ............................................................... b-13 b.7.4 power dissipation .............................................................................. b-13 b.7.5 clock ac timing specifications ......................................................... b-14 b.7.6 output ac timing specifications ....................................................... b-15 b.7.7 input ac timing specifications.......................................................... b-16 appendix c mc68040v and mc68ec040v c.1 additional signals.................................................................................. c-1 c.1.1 low frequency operation (lfo) ....................................................... c-2 c.1.2 loss of clock (loc) .......................................................................... c-2 c.1.3 system clock disable (scd)............................................................. c-2 c.2 low-power stop mode .......................................................................... c-3 c.2.1 bus arbitration and snooping ............................................................ c-5 c.2.2 low frequency operation ................................................................. c-5 c.2.3 changing bclk frequency ............................................................... c-5 c.2.4 lpstop instruction summary .......................................................... c-6 c.3 clocking during normal operation ....................................................... c-7 c.4 reset operation .................................................................................... c-7 c.5 power cycling ....................................................................................... c-9 c.6 mc68040v and mc68ec040v jtag (preliminary) .............................. c-10 c.6.1 instruction shift register ................................................................... c-11 c.6.1.1 extest ......................................................................................... c-12 c.6.1.2 highz ............................................................................................ c-12 c.6.1.3 sample/preload ...................................................................... c-12 c.6.1.4 clamp........................................................................................... c-12 c.6.1.5 bypass......................................................................................... c-13 c.6.2 boundary scan register.................................................................... c-13 c.6.3 restrictions ........................................................................................ c-16 c.6.4 disabling the ieee standard 1149.1a operation............................. c-16 c.6.5 mc68040v and mc68ec040v jtag electrical characteristics ....... c-17 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
xvi m68040 user? manual motorola table of contents (continued) paragraph page number title number c.7 mc68040v and mc68ec040v electrical characteristics..................... c-19 c.7.1 maximum ratings .............................................................................. c-19 c.7.2 thermal characteristics .................................................................... c-19 c.7.3 dc electrical specifications .............................................................. c-20 c.7.4 power dissipation.............................................................................. c-20 c.7.5 clock ac timing specifications ........................................................ c-21 c.7.6 output ac timing specifications ....................................................... c-22 c.7.7 input ac timing specifications.......................................................... c-23 appendix d m68000 family summary appendix e floating-point emulation (m68040fpsp) index f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual xvii list of illustrations figure page number title number 1-1 block diagram .............................................................................................. 1-4 1-2 programming model ..................................................................................... 1-7 2-1 integer unit pipeline ..................................................................................... 2-2 2-2 write-back cycle block diagram ................................................................. 2-3 2-3 integer unit user programming model......................................................... 2-4 2-4 integer unit supervisor programming model ............................................... 2-6 2-5 status register............................................................................................. 2-7 3-1 memory management unit ........................................................................... 3-2 3-2 memory management programming model ................................................. 3-3 3-3 urp and srp register formats.................................................................. 3-4 3-4 translation control register format ............................................................ 3-4 3-5 transparent translation register format .................................................... 3-5 3-6 mmu status register format....................................................................... 3-6 3-7 translation table structure .......................................................................... 3-8 3-8 logical address format ............................................................................... 3-9 3-9 detailed flowchart of table search operation ............................................ 3-10 3-10 detailed flowchart of descriptor fetch operation ....................................... 3-11 3-11 table descriptor formats............................................................................. 3-13 3-12 page descriptor formats ............................................................................. 3-13 3-13 example translation table .......................................................................... 3-17 3-14 translation table using indirect descriptors ............................................... 3-18 3-15 translation table using shared tables ....................................................... 3-19 3-16 translation table with nonresident tables .................................................. 3-20 3-17 translation table structure for two tasks .................................................. 3-24 3-18 logical address map with shared supervisor and user address spaces... 3-24 3-19 translation table using s-bit and w-bit to set protection ......................... 3-25 3-20 atc organization......................................................................................... 3-26 3-21 atc entry and tag fields ............................................................................ 3-27 3-22 address translation flowchart..................................................................... 3-32 3-23 mmu status interpretation ........................................................................... 3-35 4-1 overview of internal caches ........................................................................ 4-2 4-2 cache line formats ..................................................................................... 4-3 4-3 caching operation ....................................................................................... 4-4 4-4 cache control register ................................................................................ 4-5 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
xviii m68040 user? manual motorola list of illustrations (continued) figure page number title number 4-5 instruction-cache line state diagram ......................................................... 4-14 4-6 data-cache line state diagram .................................................................. 4-16 5-1 functional signal groups ............................................................................. 5-4 6-1 m68040 test logic block diagram .............................................................. 6-2 6-2 bypass register ........................................................................................... 6-6 6-3 output latch cell (o.latch) ......................................................................... 6-7 6-4 input pin cell (i.pin) ..................................................................................... 6-7 6-5 output control cells (io.ctl) ........................................................................ 6-8 6-6 general arrangement of bidirectional pins .................................................. 6-8 6-7 circuit disabling ieee standard 1149.1a .................................................... 6-14 6-8 clock input timing diagram ......................................................................... 6-22 6-9 trst timing diagram .................................................................................. 6-22 6-10 boundary scan timing diagram .................................................................. 6-23 6-11 test access port timing diagram ............................................................... 6-23 7-1 signal relationships to clocks..................................................................... 7-2 7-2 internal operand representation ................................................................. 7-3 7-3 data multiplexing ......................................................................................... 7-4 7-4 byte enable signal generation and pal equation ...................................... 7-5 7-5 example of a misaligned long-word transfer............................................. 7-7 7-6 example of a misaligned word transfer ...................................................... 7-7 7-7 misaligned long-word read transfer timing ............................................. 7-8 7-8 byte, word, and long-word read transfer flowchart ................................ 7-10 7-9 byte, word, and long-word read transfer timing................................ ..... 7-11 7-10 line read transfer flowchart...................................................................... 7-14 7-11 line read transfer timing .......................................................................... 7-15 7-12 burst-inhibited line read transfer flowchart ............................................. 7-18 7-13 burst-inhibited line read transfer timing .................................................. 7-19 7-14 byte, word, and long-word write transfer flowchart ................................ 7-20 7-15 long-word write transfer timing ................................................................ 7-21 7-16 line write transfer flowchart ...................................................................... 7-23 7-17 line write transfer timing........................................................................... 7-24 7-18 locked transfer for tas instruction timing ................................................ 7-27 7-19 interrupt pending procedure ........................................................................ 7-30 7-20 assertion of ipend ...................................................................................... 7-30 7-21 interrupt acknowledge bus cycle flowchart ............................................... 7-32 7-22 interrupt acknowledge bus cycle timing .................................................... 7-33 7-23 autovector interrupt acknowledge bus cycle timing .................................. 7-34 7-24 breakpoint interrupt acknowledge bus cycle flowchart ............................. 7-35 7-25 breakpoint interrupt acknowledge bus cycle timing .................................. 7-36 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual xix list of illustrations (continued) figure page number title number 7-26 word write access terminated with tea timing ........................................ 7-39 7-27 line read access terminated with tea timing .......................................... 7-40 7-28 retry read transfer timing ......................................................................... 7-41 7-29 retry operation on line write ...................................................................... 7-42 7-30 m68040 internal interpretation state diagram and external bus arbiter circuit ........................................................................ 7-47 7-31 lock violation example ................................................................................ 7-49 7-32 processor bus request timing.................................................................... 7-50 7-33 arbitration during relinquish and retry timing ........................................... 7-51 7-34 implicit bus ownership arbitration timing.................................................... 7-52 7-35 dual m68040 fairness arbitration state diagram ........................................ 7-53 7-36 dual m68040 prioritized arbitration state diagram ................................ ..... 7-55 7-37 m68040 synchronous dma arbitration ........................................................ 7-56 7-38 sample synchronizer circuit ........................................................................ 7-57 7-39 m68040 asynchronous dma arbitration ...................................................... 7-58 7-40 snoop-inhibited bus cycle ........................................................................... 7-61 7-41 snoop access with memory response........................................................ 7-62 7-42 snooped line read, memory inhibited ........................................................ 7-64 7-43 snooped long-word write, memory inhibited ............................................. 7-65 7-44 initial power-on reset timing...................................................................... 7-66 7-45 normal reset timing ................................................................................... 7-67 7-46 multiplexed address and data bus (line write)........................................... 7-69 7-47 dle mode block diagram ............................................................................ 7-70 7-48 dle versus normal data read timing ........................................................ 7-71 8-1 general exception processing flowchart .................................................... 8-3 8-2 general form of exception stack frame ..................................................... 8-4 8-3 interrupt recognition examples ................................................................... 8-14 8-4 interrupt exception processing flowchart .................................................... 8-16 8-5 reset exception processing flowchart........................................................ 8-18 8-6 flowchart of rte instruction for throwaway four-word frame .................. 8-22 8-7 special status word format ........................................................................ 8-24 8-8 write-back status format ............................................................................ 8-26 9-1 floating-point user programming model ..................................................... 9-2 9-2 floating-point control register .................................................................... 9-4 9-3 fpsr condition code byte.......................................................................... 9-4 9-4 fpsr quotient byte ..................................................................................... 9-5 9-5 fpsr exception status byte ....................................................................... 9-5 9-6 fpsr accrued exception byte .................................................................... 9-6 9-7 intermediate result format.......................................................................... 9-12 9-8 rounding algorithm flowchart ..................................................................... 9-14 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
xx m68040 user? manual motorola list of illustrations (continued) figure page number title number 9-9 format of denormalized operand in state frame ....................................... 9-24 9-10 mc68040 floating-point state frames ........................................................ 9-40 9-11 mapping of command bits for cmdreg3b field ....................................... 9-42 10-1 simple instruction timing example .............................................................. 10-5 10-2 instruction overlap with multiple clocks ...................................................... 10-6 10-3 interlocked stages ....................................................................................... 10-7 11-1 clock input timing diagram ......................................................................... 11-3 11-2 drive levels and test points for ac specifications ................................ ..... 11-6 11-3 read/write timing ....................................................................................... 11-7 11-4 bus arbitration timing.................................................................................. 11-8 11-5 snoop hit timing ......................................................................................... 11-9 11-6 snoop miss timing ...................................................................................... 11-10 11-7 other signal timing ..................................................................................... 11-11 11-8 mc68040 termination network ................................................................... 11-15 11-9 typical configuration for rc termination network ...................................... 11-15 11-10 heat sink with adhesive .............................................................................. 11-20 11-11 heat sink with attachment ........................................................................... 11-21 12-1 pga package dimensions........................................................................... 12-9 12-2 qfp package dimensions ........................................................................... 12-10 a-1 mc68lc040 block diagram ........................................................................ a-2 a-2 mc68lc040 programming model ............................................................... a-3 a-3 mc68lc040 functional signal groups........................................................ a-4 a-4 clock input timing diagram ......................................................................... a-10 a-5 read/write timing ....................................................................................... a-13 a-6 bus arbitration timing.................................................................................. a-14 a-7 snoop hit timing ......................................................................................... a-15 a-8 snoop miss timing ...................................................................................... a-16 a-9 other signal timing ..................................................................................... a-17 b-1 mc68ec040 block diagram ........................................................................ b-2 b-2 mc68ec040 programming model ............................................................... b-3 b-3 mc68ec040 functional signal groups ....................................................... b-4 b-4 mc68ec040 access control register format ............................................ b-6 b-5 mc68ec040 initial power-on reset timing................................................ b-8 b-6 mc68ec040 normal reset timing .............................................................. b-9 b-7 clock input timing diagram ......................................................................... b-14 b-8 read/write timing ....................................................................................... b-17 b-9 bus arbitration timing.................................................................................. b-18 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual xxi list of illustrations (continued) figure page number title number b-10 snoop hit timing.......................................................................................... b-19 b-11 snoop miss timing....................................................................................... b-20 b-12 other signal timing ..................................................................................... b-21 c-1 mc68040v and mc68ec040v functional signal groups ........................... c-3 c-2 mc68040v and mc68ec040v initial power-on reset timing ................... c-8 c-3 mc68040v and mc68ec040v normal reset timing.................................. c-9 c-4 mc68040v and mc68ec040v test logic block diagram .......................... c-11 c-5 bypass register ........................................................................................... c-13 c-6 output latch cell (o.latch) ......................................................................... c-14 c-7 input pin cell (i.pin) ..................................................................................... c-14 c-8 output control cells (io.ctl) ........................................................................ c-15 c-9 general arrangement of bidirectional pins .................................................. c-15 c-10 circuit disabling ieee standard 1149.1a ................................................... c-17 c-11 drive levels and test points for ac specifications ................................ ..... c-18 c-12 clock input timing diagram ......................................................................... c-21 c-13 read/write timing........................................................................................ c-24 c-14 bus arbitration timing .................................................................................. c-25 c-15 snoop hit timing.......................................................................................... c-26 c-16 snoop miss timing....................................................................................... c-27 c-17 other signal timing ..................................................................................... c-28 c-18 going into lpstop with arbitration ............................................................. c-29 c-19 lpstop no arbitration, cpu is master ....................................................... c-30 c-20 exiting lpstop with interrupt...................................................................... c-31 c-21 exiting of lpstop with reset ................................................................... c-31 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
xxii m68040 user? manual motorola list of tables table page number title number 1-1 m68040 data formats ................................................................................. 1-9 1-2 effective addressing modes ........................................................................ 1-10 1-3 notational conventions ................................................................................ 1-11 1-4 instruction set summary.............................................................................. 1-14 3-1 updating u-bit and m-bit for page descriptors............................................ 3-22 3-2 sfc and dfc values................................................................................... 3-22 4-1 snoop control encoding .............................................................................. 4-9 4-2 tlnx encoding ............................................................................................ 4-11 4-3 instruction-cache line state transitions ..................................................... 4-15 4-4 data-cache line state transitions .............................................................. 4-17 5-1 signal index ................................................................................................. 5-2 5-2 transfer-type encoding .............................................................................. 5-5 5-3 normal and move16 access transfer modifier encoding .......................... 5-6 5-4 alternate access transfer modifier encoding .............................................. 5-6 5-5 output driver control groups ...................................................................... 5-11 5-6 processor status encoding .......................................................................... 5-13 5-7 signal summary........................................................................................... 5-16 6-1 ieee standard 1149.1a instructions ........................................................... 6-3 6-2 boundary scan bit definitions ..................................................................... 6-10 7-1 data bus requirements for read and write cycles .................................... 7-4 7-2 summary of access types versus bus signal encodings........................... 7-6 7-3 memory alignment influence on noncachable and write-through bus cycles ......................................................................... 7-9 7-4 interrupt acknowledge termination summary ............................................. 7-31 7-5 ta and tea assertion results ..................................................................... 7-37 7-6 m68040 bus arbitration states .................................................................... 7-48 8-1 exception vector assignments .................................................................... 8-5 8-2 tracing control ............................................................................................ 8-11 8-3 interrupt levels and mask values................................................................ 8-12 8-4 exception priority groups ............................................................................ 8-19 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual xxiii list of tables (continued) table page number title number 8-5 write-back data alignment .......................................................................... 8-27 8-6 access error stack frame combinations .................................................... 8-31 9-1 floating-point control register encodings .................................................. 9-3 9-2 mc68040 fpu data formats and data types ............................................ 9-7 9-3 single-precision real format summary ...................................................... 9-8 9-4 double-precision real format summary..................................................... 9-9 9-5 extended-precision real format summary ................................................. 9-10 9-6 packed decimal real format summary ...................................................... 9-11 9-7 floating-point condition code encodings.................................................... 9-17 9-8 floating-point conditional tests .................................................................. 9-19 9-9 floating-point exception vectors ................................................................. 9-20 9-10 unimplemented instructions ......................................................................... 9-21 9-11 possible operand errors exceptions ........................................................... 9-29 9-12 overflow rounding mode values................................................................. 9-32 9-13 underflow rounding mode values............................................................... 9-34 9-14 possible divide by zero exceptions ............................................................. 9-36 9-15 divide by zero rounding mode values........................................................ 9-37 9-16 state frame field information ...................................................................... 9-44 10-1 instruction timing index ............................................................................... 10-1 10-2 number of memory accesses ...................................................................... 10-3 10-3 cinv timing ................................................................................................. 10-8 10-4 cpush best and worst case timing .......................................................... 10-8 11-1 maximum power dissipation for output buffer mode configuration ............ 11-13 11-2 thermal parameters with no heat sink or airflow ....................................... 11-17 11-3 thermal parameters with forced airflow and no heat sink for the mc68040 .................................................................. 11-18 11-4 thermal parameters with forced airflow and no heat sink for the mc68lc040 and mc68ec040 ................................. 11-19 11-5 thermal parameters with heat sink and no airflow .................................... 11-21 11-6 thermal parameters with heat sink and airflow.......................................... 11-22 c-1 additional mc68040v and mc68ec040v signals....................................... c-2 c-2 bus encodings during lpstop broadcast cycle ....................................... c-4 c-3 ieee standard 1149.1a instructions............................................................ c-12 e-1 mc68040 floating-point instructions ........................................................... e-2 e-2 mc68040fpsp floating-point instructions.................................................. e-3 e-3 support for data types and data formats .................................................. e-4 e-4 exception conditions ................................................................................... e-4 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 1- 1 section 1 introduction the mc68040, mc68040v, mc68lc040, mc68ec040, and mc68ec040v (collectively called m68040) are motorola? third generation of m68000-compatible, high-performance, 32-bit microprocessors. all five devices are virtual memory microprocessors employing multiple concurrent execution units and a highly integrated architecture that provides very high performance in a monolithic hcmos device. they integrate an mc68030-compatible integer unit (iu) and two independent caches. the mc68040, mc68040v, and mc68lc040 contain dual, independent, demand-paged memory management units (mmus) for instruction and data stream accesses and independent, 4-kbyte instruction and data caches. the mc68040 contains an mc68881/mc68882-compatible floating- point unit (fpu). the use of multiple independent execution pipelines, multiple internal buses, and a full internal harvard architecture, including separate physical caches for both instruction and data accesses, achieves a high degree of instruction execution parallelism on all three processors. the on-chip bus snoop logic, which directly supports cache coherency in multimaster applications, enhances cache functionality. the m68040 family is user object-code compatible with previous m68000 family members and is specifically optimized to reduce the execution time of compiler-generated code. all five processors implement motorola? latest hcmos technology, providing an ideal balance between speed, power, and physical device size. 1.1 differences because the functionality of individual m68040 family members are similar, this manual is organized so that the reader will take the following differences into account while reading the rest of this manual. unless otherwise noted, all references to m68040, with the exception of the differences outlined below, will apply to the mc68040, mc68040v, mc68lc040, mc68ec040, and mc68ec040v. the following paragraphs describe the differences of mc68040v, mc68lc040, mc68ec040, and the mc68ec040v from the mc68040. 1.1.1 mc68040v and mc68lc040 the mc68040v and mc68lc040 are derivatives of the mc68040. they implement the same iu and mmu as the mc68040, but have no fpu. the mc68lc040 is pin compatible with the mc68040. the mc68040v is not pin compatible with the mc68040 and contains some additional features. the following differences exist between the mc68040v, mc68lc040, and mc68040: f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
1- 2 m68040 user? manual motorola the dle pin name has been changed to js0 on both the mc68040v and mc68lc040. in addition, the mc68040v contains three new pins, system clock disable ( scd ), low frequency operation ( lfo ), and loss of clock (loc ). the mc68040v and mc68lc040 do not implement the data latch enable (dle), multiplexed, or output buffer impedance selection modes of operation. they implement only the small output buffer mode of operation. all timing and drive capabilities on both devices are equivalent to those of the mc68040 in small output buffer impedance mode. the mc68040v has an additional mode of operation, the low-power stop mode of operation. the mc68040v and mc68lc040 do not contain an fpu, causing unimplemented floating-point exceptions to occur using a new stack frame format. the mc68040v is a 3.3 volt static microprocessor that operates down to 0 mhz. for specific details on the mc68lc040, refer to appendix a mc68lc040 . for specific details on the mc68040v, refer to both appendix a mc68lc040 and appendix c mc68040v and mc68ec040v . disregard all information concerning the fpu when reading the following subsections. 1.1.2 mc68ec040 and mc68ec040v the mc68ec040 and mc68ec040v are derivatives of the mc68040. they implement the same iu as the mc68040, but have no fpu or mmu, which embedded control applications generally do not require. the mc68ec040 is pin compatible with the mc68040. the following differences exist between the mc68ec040, mc68ec040v, and the mc68040: the dle and mdis pin names have been changed to js0 and js1, respectively. ptest and pflush instructions cause an undetermined number of bus cycles; the user should not execute these instructions. the access control unit (acu) replaces the mmu. the mc68ec040 and mc68ec040v acu has two data and two instruction registers that are called data and instruction transparent translation registers in the mc68040. the mc68ec040 and mc68ec040v do not implement the dle, multiplexed, or output buffer impedance selection modes of operation. they only implement the small output buffer mode of operation. all mc68ec040 and mc68ec040v timing and drive capabilities are equivalent to the mc68040 in small output buffer mode. the mc68ec040 and mc68ec040v do not contain an fpu, causing unimplemented floating-point exceptions to occur using a new stack frame format. the mc68040v is a 3.3 volt static microprocessor that operates down to 0 mhz. refer to appendix b mc68ec040 for specific details on the mc68ec040. refer to appendix b mc68ec040 and appendix c mc68040v and mc68ec040v for specific details on the mc68ec040v. disregard information concerning the fpu and mmu when reading the following subsections. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 1- 3 1.2 features the main features of the m68040 are as follows: 6-stage pipeline, mc68030-compatible iu mc68881/mc68882-compatible fpu independent instruction and data mmus simultaneously accessible, 4-kbyte physical instruction cache and 4-kbyte physical data cache low-latency bus accesses for reduced cache miss penalty multimaster/multiprocessor support via bus snooping concurrent iu, fpu, mmu, and bus controller operation maximizes throughput 32-bit, nonmultiplexed external address and data buses with synchronous interface user object-code compatible with all earlier m68000 microprocessors 4-gbyte direct addressing range software support including optimizing c compiler and unix ? system v port the on-chip fpu and large physical instruction and data caches yield improved system performance and increased functionality. the independent instruction and data mmus and increased internal parallelism also improve performance. 1.3 extensions to the m68000 family the m68040 is compatible with the ansi/ieee standard 754 for binary floating-point arithmetic . the mc68040? fpu has been optimized to execute the most commonly used subset of the mc68881/mc68882 instruction sets and includes additional instruction formats for single- and double-precision rounding results. software emulates floating-point instructions not directly supported in hardware. refer to appendix e m68040 floating- point emulation (mc68040fpsp) for details on software emulation. the move16 user instruction is new to the instruction set, supporting efficient 16-byte memory-to-memory data transfers. 1.4 functional blocks figure 1-1 illustrates a simplified block diagram of the mc68040. refer to appendix a mc68lc040 for information on the mc68lc040? and mc68040v's functional blocks; and appendix b mc68ec040 for information on the mc68ec040? and mc68ec040v's functional blocks. the m68040 iu pipeline has been expanded from the mc68030 to include effective address calculation ( calculate) and operand fetch ( fetch) stages with commonly used effective addressing modes. conditional branches are optimized for the unix is a registered trademark of at&t bell laboratories. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
1- 4 m68040 user? manual motorola more common case of the branch taken, and both execution paths of the branch are fetched and decoded to minimize refilling of the instruction pipeline. decode ea
calculate write-
back integer
unit convert execute write-
back instruction
atc instruction
mmu/cache/snoop
controller bus
control
signals data
bus address
bus data
atc data
mmu/cache/snoop
controller operand data bus instruction data bus instruction
cache data
cache floating-
point
unit data memory unit instruction memory unit b
u
s
c
o
n
t
r
o
l
l
e
r instruction
address data
address instruction
fetch execute ea
fetch figure 1-1. block diagram to improve memory management, the m68040 includes separate, independent paged mmus for instruction and data accesses. each mmu stores recently used address mappings in separate 64-entry address translation caches (atcs). each mmu also has two transparent translation registers that define a one-to-one mapping for address space segments ranging in size from 16 mbytes to 4 gbytes each. two memory units independently interface with the iu and fpu. each unit consists of an mmu, an atc, a main cache, and a snoop controller. the mmus perform memory management on a demand-page basis. by translating logical-to-physical addresses using translation tables stored in memory, the mmus support virtual memory systems. each mmu stores recently used address mappings in an atc, reducing the average translation time. separate on-chip instruction and data caches operate independently and are accessed in parallel with address translation. the caches improve the overall performance of the system by reducing the number of bus transfers required by the processor to fetch information from memory and by increasing the bus bandwidth available for alternate bus f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 1- 5 masters in the system. both caches are organized as four-way set associative with 64 sets of four lines. each line contains four long words for a storage capability of 4 kbytes for each cache (8 kbytes total). each cache and corresponding mmu is allocated separate internal address and data buses, allowing simultaneous access to both. the data cache provides write-through or copyback write modes that can be configured on a page-by-page basis. the caches are physically mapped, reducing software support for multitasking operating systems, and support external bus snooping to maintain cache coherency in multimaster systems. the bus snoop logic provides cache coherency in multimaster applications. the bus controller executes bus transfers on the external bus and prioritizes external memory requests from each cache. the m68040 bus controller supports a high-speed, nonmultiplexed, synchronous, external bus interface supporting burst accesses for both reads and writes to provide high data transfer rates to and from the caches. additional bus signals support bus snooping and external cache tag maintenance. the mc68040 contains an on-chip fpu, which is user object-code compatible with the mc68881/mc68882 floating-point coprocessors. the fpu has pipelined instruction execution. floating-point instructions in the fpu execute concurrently with integer instructions in the iu. 1.5 processing states the processor is always in one of three states: normal processing, exception processing, or halted. it is in the normal processing state when executing instructions, fetching instructions and operands, and storing instruction results. exception processing is the transition from program processing to system, interrupt, and exception handling. exception processing includes fetching the exception vector, stacking operations, and refilling the instruction pipe caused after an exception. the processor enters exception processing when an exceptional internal condition arises such as tracing an instruction, an instruction results in a trap, or executing specific instructions. external conditions, such as interrupts and access errors, also cause exceptions. exception processing ends when the first instruction of the exception handler begins to execute. the processor halts when it receives an access error or generates an address error while in the exception processing state. for example, if during exception processing of one access error another access error occurs, the mc68040 is unable to complete the transition to normal processing and cannot save the internal state of the machine. the processor assumes that the system is not operational and halts. only an external reset can restart a halted processor. note that when the processor executes a stop instruction, it is in a special type of normal processing state, one without bus cycles. the processor stops, but it does not halt. 1.6 programming model the mc68040 programming model is separated into two privilege modes: supervisor and user. the s-bit in the status register (sr) indicates the privilege mode that the processor f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
1- 6 m68040 user? manual motorola uses. the iu identifies a logical address by accessing either the supervisor or user address space, maintaining the differentiation between supervisor and user modes. the mmus use the indicated privilege mode to control and translate memory accesses, protecting supervisor code, data, and resources from user program accesses. refer to appendix b mc68ec040 for details concerning the mc68ec040 address translation. programs access registers based on the indicated mode. user programs can only access registers specific to the user mode; whereas, system software executing in the supervisor mode can access all registers, using the control registers to perform supervisory functions. user programs are thus restricted from accessing privileged information, and the operating system performs management and service tasks for the user programs by coordinating their activities. this difference allows the supervisor mode to protect system resources from uncontrolled accesses. most instructions execute in either mode, but some instructions that have important system effects are privileged and can only execute in the supervisor mode. for instance, user programs cannot execute the stop or reset instructions. to prevent a user program from entering the supervisor mode, except in a controlled manner, instructions that can alter the s-bit in the sr are privileged. the trap instructions provide controlled access to operating system services for user programs. if the s-bit in the sr is set, the processor executes instructions in the supervisor mode. because the processor performs all exception processing in the supervisor mode, all bus cycles generated during exception processing are supervisor references, and all stack accesses use the active supervisor stack pointer. if the s-bit of the sr is clear, the processor executes instructions in the user mode. the bus cycles for an instruction executed in the user mode are user references. the values on the transfer modifier pins indicate either supervisor or user accesses. the processor utilizes the user mode and the user programming model when it is in normal processing. during exception processing, the processor changes from user to supervisor mode. exception processing saves the current value of the sr on the active supervisor stack and then sets the s-bit, forcing the processor into the supervisor mode. to return to the user mode, a system routine must execute one of the following instructions: move to sr, andi to sr, eori to sr, ori to sr, or rte, which execute in the supervisor mode, modifying the s-bit of the sr. after these instructions execute, the instruction pipeline is flushed and is refilled from the appropriate address space. the mc68040 integrates the functions of the iu, fpu, and mmu. the registers depicted in the programming model (see figure 1-2) provide operand storage and control for these three units. the registers are partitioned into two levels of privilege modes: user and supervisor. the user programming model is the same as the user programming model of the mc68030, which consists of 16, general-purpose, 32-bit registers and two control registers. the mc68040 user programming model also incorporates the mc68881/mc68882 programming model consisting of eight, 80-bit, floating-point data registers, a floating-point control register, a floating-point status register, and a floating- point instruction address register. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 1- 7 only system programmers can use the supervisor programming model to implement operating system functions, i/o control, and memory management subsystems. this supervisor/user distinction in the m68000 family architecture allows for the writing of application software that executes in the user mode and migrates to the mc68040 from any m68000 family platform without modification. the supervisor programming model contains the control features that system designers need to modify system software when porting to a new design. for example, only the supervisor software can read or write to the transparent translation registers of the mc68040. the existence of the transparent translation registers does not affect the programming resources of user application programs. supervisor programming model user programming model ccr pc a7/usp a6 a5 a4 a3 a2 a1 a0 d7 d6 d5 d4 d3 d2 d1 d0 31 0 data
registers address
registers 31 0 79 0 fp0 fp1 fp2 fp3 fp4 fp5 fp6 fp7 fpcr fpsr fpiar floating-point
data
registers fp control register fp status register fp instruction address register 31 0 a7'/isp a7"/msp sr vbr sfc dfc cacr urp srp tc dtt0 dtt1 itt0 itt1 mmusr (ccr) program counter condition code register interrupt stack pointer master stack pointer status register (ccr is also shown in the user programming model) vector base register source function code destination function code cache control register user root pointer register supervisor root pointer register translation control register data transparent translation register 0 data transparent translation register 1 instruction transparent translation register 0 instruction transparent translation register 1 mmu status register user stack pointer 15 0 figure 1-2. programming model f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
1- 8 m68040 user? manual motorola the user programming model includes eight data registers, seven address registers, and a stack pointer register. the address registers and stack pointer can be used as base address registers or software stack pointers, and any of the 16 registers can be used as index registers. two control registers are available in the user mode?he program counter (pc), which usually contains the address of the instruction that the mc68040 is executing, and the lower byte of the sr, which is accessible as the condition code register (ccr). the ccr contains the condition codes that reflect the results of a previous operation and can be used for conditional instruction execution in a program. the supervisor programming model includes the upper byte of the sr, which contains operation control information. the vector base register (vbr) contains the base address of the exception vector table, which is used in exception processing. the source function code (sfc) and destination function code (dfc) registers contain 3-bit function codes. these function codes can be considered extensions to the 32-bit logical address. the processor automatically generates function codes to select address spaces for data and program accesses in the user and supervisor modes. some instructions use the alternate function code registers to specify the function codes for various operations. the cache control register (cacr) controls enabling of the on-chip instruction and data caches of the mc68040. the supervisor root pointer (srp) and user root pointer (urp) registers point to the root of the address translation table tree to be used for supervisor and user mode accesses. the translation control register (tcr) enables logical-to-physical address translation and selects either 4- or 8-kbyte page sizes. there are four transparent translation registers, two for instruction accesses and two for data accesses. these registers allow portions of the logical address space to be transparently mapped and accessed without the use of resident descriptors in an atc. the mmu status register (mmusr) contains status information derived from the execution of a ptest instruction. the ptest instruction searches the translation tables for the logical address, specified by this instruction? effective address field and the dfc, and returns status information corresponding to the translation. the user programming model can also access the entire floating-point programming model. the eight 80-bit floating-point data registers are analogous to the integer data registers. a 32-bit floating-point control register (fpcr) contains an exception enable byte that enables and disables traps for each class of floating-point exceptions and a mode byte that sets the user-selectable rounding and precision modes. a floating-point status register (fpsr) contains a condition code byte, quotient byte, exception status byte, and accrued exception byte. a floating-point exception handler can use the address in the 32- bit floating-point instruction address register (fpiar) to locate the floating-point instruction that has caused an exception. instructions that do not modify the fpiar can be used to read the fpiar in the exception handler without changing the previous value. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 1- 9 1.7 data format summary the m68040 supports the basic data formats of the m68000 family. some data formats apply only to the iu, some only to the fpu, and some to both. in addition, the instruction set supports operations on other data formats such as memory addresses. the operand data formats supported by the iu are the standard twos-complement data formats defined in the m68000 family architecture plus a new data format (16-byte block) for the move16 instruction. registers, memory, or instructions themselves can contain iu operands. the operand size for each instruction is either explicitly encoded in the instruction or implicitly defined by the instruction operation. whenever an integer is used in a floating-point operation, the fpu automatically converts it to an extended-precision floating-point number before using the integer. the fpu implements single- and double-precision floating-point data formats as defined by the ieee 754 standard. the fpu does not directly support packed decimal real format. however, by trapping as an unimplemented data format instead of as an illegal instruction, software emulation supports the packed decimal format. additionally, each data format has a special encoding that represents one of five data types: normalized numbers, denormalized numbers, zeros, infinities, and not-a-numbers (nans). table 1-1 lists the data formats for both the iu and the fpu. refer to m68000pm/ad, m68000 family programmer? reference manual, for details on data format organization in registers and memory. table 1-1. m68040 data formats operand data format size supported in notes bit 1 bit iu bit field 1?2 bits iu field of consecutive bits binary-coded decimal (bcd) 8 bits iu packed: 2 digits/byte; unpacked: 1 digit/byte byte integer 8 bits iu, fpu word integer 16 bits iu, fpu long-word integer 32 bits iu, fpu quad-word integer 64 bits iu any two data registers 16-byte 128 bits iu memory only, aligned to 16-byte boundary single-precision real 32 bits fpu 1-bit sign, 8-bit exponent, 23-bit fraction double-precision real 64 bits fpu 1-bit sign, 11-bit exponent, 52-bit fraction extended-precision real 80 bits fpu 1-bit sign, 15-bit exponent, 64-bit mantissa 1.8 addressing capabilities summary the m68040 supports the basic addressing modes of the m68000 family. the register indirect addressing modes support postincrement, predecrement, offset, and indexing, which are particularly useful for handling data structures common to sophisticated f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
1- 10 m68040 user? manual motorola applications and high-level languages. the program counter indirect mode also has indexing and offset capabilities. this addressing mode is typically required to support position-independent software. besides these addressing modes, the m68040 provides index sizing and scaling features. an instruction? addressing mode can specify the value of an operand, a register containing the operand, or how to derive the effective address of an operand in memory. each addressing mode has an assembler syntax. some instructions imply the addressing mode for an operand. these instructions include the appropriate fields for operands that use only one addressing mode. table 1-2 lists a summary of the effective addressing modes for the m68040. refer to m68000pm/ad, m68000 family programmer? reference manual, for details on instruction format and addressing modes. table 1-2. effective addressing modes addressing modes syntax register direct data address dn an register indirect address address with postincrement address with predecrement address with displacement (an) (an)+ ?an) (d16,an) address register indirect with index 8-bit displacement base displacement (d 8 ,an,xn) (bd,an,xn) memory indirect postindexed preindexed ([bd,an],xn,od) ([bd,an,xn],od) program counter indirect with displacement (d 16 ,pc) program counter indirect with index 8-bit displacement base displacement (d 8 ,pc,xn) (bd,pc,xn) program counter memory indirect postindexed preindexed ([bd,pc],xn,od) ([bd,pc,xn],od) absolute data addressing short long (xxx).w (xxx).l immediate # f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 1- 11 1.9 notational conventions table 1-3 lists the notation conventions used throughout this manual unless otherwise specified. table 1-3. notational conventions single- and double-operand operations + arithmetic addition or postincrement indicator. arithmetic subtraction or predecrement indicator. arithmetic multiplication. ? arithmetic division or conjunction symbol. ~ invert; operand is logically complemented. l logical and v logical or ? logical exclusive or source operand is moved to destination operand. ? two operands are exchanged. any double-operand operation. tested operand is compared to zero and the condition codes are set appropriately. sign-extended all bits of the upper portion are made equal to the high-order bit of the lower portion. other operations trap equivalent to format ? offset word (ssp); ssp ?2 ssp; pc (ssp); ssp ?4 ssp; sr (ssp); ssp ?2 ssp; (vector) pc stop enter the stopped state, waiting for interrupts. 10 the operand is bcd; operations are performed in decimal. if then else test the condition. if true, the operations after ?hen?are performed. if the condition is false and the optional ?lse?clause is present, the operations after ?lse?are performed. if the condition is false and else is omitted, the instruction performs no operation. refer to the bcc instruction description as an example. register specification an any address register n (example: a3 is address register 3) ax, ay source and destination address registers, respectively. br base register?n, pc, or suppressed. dc data register d7?0, used during compare. dh, dl data registers high- or low-order 32 bits of product. dn any data register n (example: d5 is data register 5) dr, dq data register? remainder or quotient of divide. du data register d7?0, used during update. dx, dy source and destination data registers, respectively. mrn any memory register n. rn any address or data register rx, ry any source and destination registers, respectively. xn index register?n, dn, or suppressed. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
1- 12 m68040 user? manual motorola table 1-3. notational conventions (continued) data format and type + inf positive infinity operand data format: byte (b), word (w), long (l), single (s), double (d), extended (x), or packed (p). b, w, l specifies a signed integer data type (twos complement) of byte, word, or long word. d double-precision real data format (64 bits). k a twos complement signed integer (?4 to +17) specifying a number? format to be stored in the packed decimal format. p packed bcd real data format (96 bits, 12 bytes). s single-precision real data format (32 bits). x extended-precision real data format (96 bits, 16 bits unused). ?inf negative infinity subfields and qualifiers # or # immediate data following the instruction word(s). ( ) identifies an indirect address in a register. [ ] identifies an indirect address in memory. bd base displacement ccc index into the mc68881/mc68882 constant rom d n displacement value, n bits wide (example: d 16 is a 16-bit displacement). lsb least significant bit lsw least significant word msb most significant bit msw most significant word od outer displacement scale a scale factor (1, 2, 4, or 8, for no-word, word, long-word, or quad-word scaling, respectively). size the index register? size (w for word, l for long word). {offset:width} bit field selection. register names ccr condition code register (lower byte of status register) dfc destination function code register fpcr any floating-point system control register (fpcr, fpsr, or fpiar) fpm, fpn any floating-point data register specified as the source or destination, respectively. ic, dc, ic/dc instruction, data, or both caches mmusr mmu status register pc program counter rc any non floating-point control register sfc source function code register sr status register f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 1- 13 table 1-3. notational conventions (concluded) register codes * general case. c carry bit in ccr cc condition codes from ccr fc function code n negative bit in ccr u undefined, reserved for motorola use. v overflow bit in ccr x extend bit in ccr z zero bit in ccr not affected or applicable. stack pointers isp supervisor/interrupt stack pointer msp supervisor/master stack pointer sp active stack pointer ssp supervisor (master or interrupt) stack pointer usp user stack pointer miscellaneous effective address assemble program label list of registers, for example d3?0. lb lower bound m bit m of an operand m n bits m through n of operand ub upper bound 1.10 instruction set overview the instruction set is tailored to support high-level languages and is optimized for those instructions most commonly executed. the floating-point instructions for the m68040 are a commonly used subset of the mc68881/mc68882 instruction set with new arithmetic instructions to explicitly select single- or double-precision rounding. the remaining unimplemented instructions are less frequently used and are efficiently emulated in the m68040fpsp, maintaining compatibility with the mc68881/mc68882 floating-point coprocessors. the m68040 instruction set includes move16, a new user instruction that allows high-speed transfers of 16-byte blocks between external devices such as memory to memory or coprocessor to memory. table 1-4 provides an alphabetized listing of the m68040 instruction set? opcode, operation, and syntax. refer to table 1-3 for notations used in table 1-4. the left operand in the syntax is always the source operand, and the right operand is the destination operand. refer to m68000pm/ad, m68000 family programmer? reference manual, for details on instructions used by the m68040. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
1- 14 m68040 user? manual motorola table 1-4. instruction set summary opcode operation syntax abcd bcd source + bcd destination + x destination abcd dy,dx abcd ?ay),?ax) add source + destination destination add ,dn add dn, adda source + destination destination adda ,an addi immediate data + destination destination addi #, addq immediate data + destination destination addq #, addx source + destination + x destination addx dy,dx addx ?ay),?ax) and source l destination destination and ,dn and dn, andi immediate data l destination destination andi #, andi to ccr source l ccr ccr andi #,ccr andi to sr if supervisor state then source l sr sr else trap andi #,sr asl, asr destination shifted by count destination asd dx,dy 1 asd #,dy 1 asd 1 bcc if condition true then pc + d n pc bcc bchg ~(bit number of destination) z; ~(bit number of destination) (bit number) of destination bchg dn, bchg #, bclr ~(bit number of destination) z; 0 bit number of destination bclr dn, bclr #, bfchg ~(bit field of destination) bit field of destination bfchg {offset:width} bfclr 0 bit field of destination bfclr {offset:width} bfexts bit field of source dn bfexts {offset:width},dn bfextu bit offset of source dn bfextu {offset:width},dn bfffo bit offset of source bit scan dn bfffo {offset:width},dn bfins dn bit field of destination bfins dn,{offset:width} bfset 1s bit field of destination bfset {offset:width} bftst bit field of destination bftst {offset:width} bkpt run breakpoint acknowledge cycle; trap as illegal instruction bkpt # bra pc + d n pc bra bset ~(bit number of destination) z; 1 bit number of destination bset dn, bset #, bsr sp ?4 sp; pc (sp); pc + d n pc bsr f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 1- 15 table 1-4. instruction set summary (continued) opcode operation syntax btst ?bit number of destination) z; btst dn, btst #, cas cas destination ?compare operand cc; if z, update operand destination else destination compare operand cas dc,du, cas2 cas2 destination 1 ?compare 1 cc; if z, destination 2 ?compare cc; if z, update 1 destination 1; update 2 destination 2 else destination 1 compare 1; destination 2 compare 2 cas2 dc1?c2,du1?u2,(rn1)?rn2) chk if dn < 0 or dn > source then trap chk ,dn chk2 if rn < lb or if rn > ub then trap chk2 ,rn cinv if supervisor state then invalidate selected cache lines else trap cinvl , (an) cinvp , (an) cinva clr 0 destination clr cmp destination ?source cc cmp ,dn cmpa destination ?source cmpa ,an cmpi destination ?immediate data cmpi #, cmpm destination ?source cc cmpm (ay)+,(ax)+ cmp2 compare rn < lb or rn > ub and set condition codes cmp2 ,rn cpush if supervisor state then if data cache push selected dirty data cache lines; invalidate selected cache lines else trap cpushl , (an) cpushp , (an) cpusha dbcc if condition false then (dn? dn; if dn 1 ? then pc + d n pc) dbcc dn, divs, divsl destination ? source destination divs.w ,dn 32 ? 16 16r:16q divs.l ,dq 32 ? 32 32q divs.l ,dr:dq 64 ? 32 32r:32q divsl.l ,dr:dq 32 ? 32 32r:32q divu, divul destination ? source destination divu.w ,dn 32 ? 16 16r:16q divu.l ,dq 32 ? 32 32q divu.l ,dr:dq 64 ? 32 32r:32q divul.l ,dr:dq 32 ? 32 32r:32q eor source ? destination destination eor dn, eori immediate data ? destination destination eori #, eori to ccr source ? ccr ccr eori #,ccr eori to sr if supervisor state then source ? sr sr else trap eori #,sr f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
1- 16 m68040 user? manual motorola table 1-4. instruction set summary (continued) opcode operation syntax exg rx ? ry exg dx,dy exg ax,ay exg dx,ay exg ay,dx ext extb destination sign ?extended destination ext.w dn extend byte to word ext.l l dn extend word to long word extb.l dn extend byte to long word fabs 2 absolute value of source fpn fabs. ,fpn fabs.x fpm,fpn fabs.x fpn frabs. ,fpn 3 frabs.x fpm,fpn 3 frabs.x fpn 3 fadd 2 source + fpn fpn fadd. ,fpn fadd.x fpm,fpn fradd. ,fpn 3 fradd.x fpm,fpn 3 fbcc 2 if condition true then pc + d n pc fbcc.size fcmp 2 fpn ?source fcmp. ,fpn fcmp.x fpm,fpn fdbcc 2 if condition true then no operation else dn ?1 dn if dn 1 ? then pc + d n pc else execute next instruction fdbcc dn, fdiv 2 fpn ? source fpn fdiv. ,fpn fdiv.x fpm,fpn frdiv. ,fpn 3 frdiv.x fpm,fpn 3 fmove 2 source destination fmove. ,fpn fmove. fpm, fmove.p fpm,{dn} fmove.p fpm,{#k} frmove. ,fpn 3 fmove 2 source destination fmove.l ,fpcr fmove.l fpcr, fmovem 2 register list destination source register list fmovem.x , 4 fmovem.x dn, fmovem.x , 4 fmovem.x ,dn fmovem 2 register list destination source register list fmovem.l , 5 fmovem.l , 5 fmul 2 source fpn fpn fmul. ,fpn fmul.x fpm,fpn frmul ,fpn 3 frmul.x fpm,fpn 3 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 1- 17 table 1-4. instruction set summary (continued) opcode operation syntax fneg 2 ?source) fpn fneg. ,fpn fneg.x fpm,fpn fneg.x fpn frneg. ,fpn 3 frneg.x fpm,fpn 3 frneg.x fpn 3 fnop 2 none fnop frestore 2 if in supervisor state then fpu state frame internal state else trap frestore fsave 2 if in supervisor state then fpu internal state state frame else trap fsave fscc 2 if condition true then 1s destination else 0s destination fscc.size fsgldiv fpn ? source fpn fsgldiv. ,fpn fsgldiv.x fpm,fpn fsglmul source fpn fpn fsgmul. ,fpn fsglmul.x fpm, fpn fsqrt 2 square root of source fpn fsqrt. ,fpn fsqrt.x fpm,fpn fsqrt.x fpn frsqrt. ,fpn 3 frsqrt fpm,fpn 3 frsqrt fpn 3 fsub 2 fpn ?source fpn fsub. ,fpn fsub.x fpm,fpn frsub. ,fpn 3 frsub.x fpm,fpn 3 ftrapcc 2 if condition true then trap ftrapcc ftrapcc.w # ftrapcc.l # ftst 2 condition codes for operand fpcc ftst. ftst.x fpm illegal ssp ?2 ssp; vector offset (ssp); ssp ?4 ssp; pc (ssp); ssp ?2 ssp; sr (ssp); illegal instruction vector address pc illegal jmp destination address pc jmp jsr sp ?4 sp; pc (sp) destination address pc jsr lea an lea ,an link sp 4 sp; an (sp) sp an, sp+d sp link an,d n f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
1- 18 m68040 user? manual motorola table 1-4. instruction set summary (continued) opcode operation syntax lpstop 6 if supervisor state immediate data sr sr broadcast cycle stop else trap lpstop # lsl, lsr destination shifted by count destination lsd dx,dy 1 lsd #,dy 1 lsd 1 move source destination move , movea source destination movea ,an move from ccr ccr destination move ccr, move to ccr source ccr move ,ccr move from sr if supervisor state then sr destination else trap move sr, move to sr if supervisor state then source sr else trap move ,sr move usp if supervisor state then usp an or an usp else trap move usp,an move an,usp move16 source block destination block move16 (ax)+, (ay)+ 7 move16 (xxx).l, (an) move16 (an), (xxx).l move16 (an)+, (xxx).l movec if supervisor state then rc rn or rn rc else trap movec rc,rn movec rn,rc movem registers destination source registers movem , 4 movem , 4 movep source destination movep dx,(d n ,ay) movep (d n ,ay),dx moveq immediate data destination moveq #,dn moves if supervisor state then rn destination [dfc] or source [sfc] rn else trap moves rn, moves ,rn muls source destination destination muls.w ,dn 16 16 32 muls.l ,dl 32 32 32 muls.l ,dh?l 32 32 64 mulu source destination destination mulu.w ,dn 16 16 32 mulu.l ,dl 32 32 32 mulu.l ,dh?l 32 32 64 nbcd 0 (destination 10 ) ?x destination nbcd neg 0 ?(destination) destination neg negx 0 (destination) ?x destination negx f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 1- 19 table 1-4. instruction set summary (continued) opcode operation syntax nop none nop not ~ destination destination not or source v destination destination or ,dn or dn, ori immediate data v destination destination ori #, ori to ccr source v ccr ccr ori #,ccr ori to sr if supervisor state then source v sr sr else trap ori #,sr pack source (unpacked bcd) + adjustment destination (packed bcd) pack ?ax),?ay),#(adjustment) pack dx,dy,#(adjustment) pea sp ?4 sp; (sp) pea pflush 8 if supervisor state then invalidate instruction and data atc entries for destination address else trap pflush (an) pflushn (an) pflusha pflushan ptest 8 if supervisor state then logical address status mmusr; entry atc else trap ptestr (an) ptestw (an) reset if supervisor state then assert rsto line else trap reset rol, ror destination rotated by count destination rod rx,dy 1 rod #,dy 1 roxl, roxr destination rotated with x by count destination roxd dx,dy 1 roxd #,dy 1 roxd 1 rtd (sp) pc; sp + 4 + d n sp rtd #(d n ) rte if supervisor state then (sp) sr; sp + 2 sp; (sp) pc; sp + 4 sp; restore state and deallocate stack according to (sp) else trap rte rtr (sp) ccr; sp + 2 sp; (sp) pc; sp + 4 sp rtr rts (sp) pc; sp + 4 sp rts sbcd destination10 ?source 10 ?x destination sbcd dx,dy sbcd ?ax),?ay) scc if condition true then 1s destination else 0s destination scc stop if supervisor state then immediate data sr; stop else trap stop # sub destination ?source destination sub ,dn sub dn, f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
1- 20 m68040 user? manual motorola table 1-4. instruction set summary (concluded) opcode operation syntax suba destination ?source destination suba ,an subi destination ?immediate data destination subi #, subq destination ?immediate data destination subq #, subx destination ?source ?x destination subx dx,dy subx ?ax),?ay) swap register 31?6 ? register 15? swap dn tas destination tested condition codes; 1 bit 7 of destination tas trap ssp ?2 ssp; format ? offset (ssp); ssp ?4 ssp; pc (ssp); ssp ?2 ssp; sr (ssp); vector address pc trap # trapcc if cc then trap trapcc trapcc.w # trapcc.l # trapv if v then trap trapv tst destination tested condition codes tst unlk an sp; (sp) an; sp + 4 sp unlk an unpk source (packed bcd) + adjustment destination (unpacked bcd) unpack ?ax),?ay),#(adjustment) unpack dx,dy,#(adjustment) notes: 1. where d is direction, left or right. 2. available only on the mc68040. 3. where r is rounding precision, single or double precision. 4. list refers to register. 5. list refers to control registers only. 6. available only on the mc68040v and mc68ec040v. 7. move16 (ax)+,(ay)+ is functionally the same as move16 (ax),(ay)+ when ax = ay. the address register is only incremented once, and the line is copied over itself rather than to the next line. 8. not available for the mc68ec040 or mc68ec040v. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 2- 1 section 2 integer unit this section describes the organization of the m68040 integer unit (iu) and presents a brief description of the associated registers. refer to section 3 memory management unit (except mc68ec040 and mc68ec040v) for details concerning the memory management unit (mmu) programming model, and to section 9 floating-point unit (mc68040 only) for details concerning the floating-point unit (fpu) programming model. 2.1 integer unit pipeline the iu carries out logical and arithmetic operations using six separate subunits. each unit is dedicated to a different stage of the iu pipeline, handling a total of six separate instructions simultaneously. pipelining is a technique that overlaps the processing of different parts of several instructions. pipelining simulates an assembly line with the iu containing a number of instructions in different phases of processing. the iu pipeline consists of six stages: 1. instruction fetch?etching an instruction from memory. 2. decode?onverting an instruction into micro-instructions. 3. calculate?f the instruction calls for data from memory, the location of the data, its memory address is calculated. 4. fetch?ata is fetched from memory. 5. execute?he data is manipulated during execution. 6. write-back?he result of the computation is written back to on-chip caches or external memory. the pipeline contains special shadow registers that can begin processing future instructions for conditional branches while the main pipeline is processing current instructions. the calculate stage eliminates pipeline blockage for instructions with postincrement, postdecrement, or immediate add and load to address register for updates that occur in the calculate stage. the write-back stage can write data over the system bus to store a result in external memory or directly to on-chip caches. these write- backs to memory can be deferred until the most opportune moment because of the m68040 bus interface. figure 2-1 illustrates the iu pipeline. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
2- 2 m68040 user? manual motorola instruction data
from cache or bus
controller to fpu calculate fetch to cache or
bus controller execute write-back shadow shadow instruction
fetch decode to cache or
bus controller figure 2-1. integer unit pipeline an instruction stream is fetched from the instruction memory unit and decoded on an instruction-by-instruction basis in the decode stage. multiple instructions are fetched to keep the pipeline stages full so that the pipeline will not stall. the decoded instruction is then passed to the calculate stage to calculate the effective addresses that the instruction requires. the calculate stage initiates additional fetches from the instruction stream to obtain the effective address extension words and performs the effective address calculation. the initial execution of the instruction in the execute stage handles any data registers required for the calculation, which passes the register back to the calculate stage. the resulting effective address is passed to the fetch stage, which initiates an operand fetch from the data memory controller if the effective address is for a source operand. the fetched operand is returned to the execute stage, which completes execution of the instruction and writes any result to either a data register, memory, or back to the calculate stage for storage in an address register. for a memory destination, the fetch stage passes the address to the execution stage. the previously described sequence of effective address calculation and fetch can occur multiple times for an instruction, depending on the source and/or destination addressing modes. for memory indirect addressing modes, the calculate stage initiates an operand fetch from the intermediate indirect memory address, then calculates the final f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 2- 3 effective address. also, some instructions access multiple memory operands and initiate fetches for each operand. the instruction finishes execution in the execute stage. instructions with write-back operands to memory generate pending write accesses that are passed to the write-back stage. the write occurs to the data memory unit if it is not busy. if the following instruction, which is in the fetch stage, requires an operand fetch, the write-back stalls in the write-back stage since it is at a lower priority. the write-back can stall indefinitely until either the data memory unit is free or another write is pending from the execution stage. figure 2-2 illustrates a write cycle, which begins in the iu pipeline. the iu stores the logical address and data for a write operation in a temporary holding register (wb3). write operation control passes from the iu to the data memory unit once the data memory unit is idle. when the data memory unit receives the logical address and data from the iu, it stores the logical address and data to a second temporary holding register (wb2). the data memory unit then translates the logical address into a physical address. if the address translation is successful, the data memory unit either stores an address translation in the data cache (write hit) or passes it to the bus controller (write-through with write miss). once the bus controller is ready to execute the external write operation, it multiplexes the data to the correct data byte lanes and stores the multiplexed data and physical address into a third holding register (wb1). wb1 is used in the actual write operation seen on the address and data buses. appendix b mc68ec040 contains details on address translation in the mc68ec040. decode
calculate write-
back (wb3) integer unit instruction memory unit instruction
fetch execute
fetch address
bus data
bus data cache bus
controller wb1 data
atc data memory unit data mmu/
cache/snoop
controller wb2 data mux push
buffer bus
control
signals logical address physical address figure 2-2. write-back cycle block diagram f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
2- 4 m68040 user? manual motorola 2.2 integer unit register description the following paragraphs describe the iu registers in the user and supervisor programming models. refer to section 3 memory management unit (except mc68ec040 and mc68ec040v) for details on the mmu programming model and section 9 floating-point unit (mc68040 only) for details on the fpu programming model. 2.2.1 integer unit user programming model figure 2-3 illustrates the iu portion of the user programming model. the model is the same as for previous m68000 family microprocessors, consisting of the following registers: 16 general-purpose 32-bit registers (d7?0, a7?0) 32-bit program counter (pc) 8-bit condition code register (ccr) 2.2.1.1 data registers (d7?0). these registers are used as data registers for bit and bit field (1 to 32 bits), byte (8 bit), word (16 bit), long-word (32 bit), and quad-word (64 bit) operations. these registers may also be used as index registers. 2.2.1.2 address registers (a6?0). these registers can be used as software stack pointers, index registers, or base address registers. the address registers may be used for word and long-word operations. a0 a1 a2 a3 a4 a5 a6 a7
(usp) pc d0 d1 d2 d3 d4 d5 d6 d7 data
registers address
registers user
stack
pointer program
counter ccr condition
code
register 0 15 31 0 15 31 0 7 15 0 31 0 15 31 figure 2-3. integer unit user programming model f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 2- 5 2.2.1.3 system stack pointer (a7). a7 is used as a hardware stack pointer during stacking for subroutine calls and exception handling. the register designation a7 refers to three different uses of the register: the user stack pointer (usp) (a7) in the user programming model and either the interrupt stack pointer (isp) or master stack pointer (msp) (a7' or a7", respectively) in the supervisor programming model. when the s-bit in the status register (sr) is clear, the usp is the active stack pointer. explicit references to the system stack pointer (ssp) refer to the usp while the processor is operating in the user mode. a subroutine call saves the program counter (pc) on the active system stack, and the return restores it from the active system stack. both the pc and the sr are saved on the supervisor stack (either isp or msp) during the processing of exceptions and interrupts. thus, the execution of supervisor level code is independent of user code and condition of the user stack. conversely, user programs use the usp independently of supervisor stack requirements. 2.2.1.4 program counter. the pc contains the address of the currently executing instruction. during instruction execution and exception processing, the processor automatically increments the contents of the pc or places a new value in the pc, as appropriate. for some addressing modes, the pc can be used as a pointer for pc-relative addressing. 2.2.1.5 condition code register. the ccr consists of five bits of the sr least significant byte. the first four bits represent a condition of the result generated by a processor operation. the fifth bit, the extend bit (x-bit), is an operand for multiprecision computations. the carry bit (c-bit) and the x-bit are separate in the m68000 family to simplify programming techniques that use them. 2.2.2 integer unit supervisor programming model only system programmers use the supervisor programming model (see figure 2-4) to implement sensitive operating system functions, i/o control, and mmu subsystems. all accesses that affect the control features of the m68040 are in the supervisor programming model. thus, all application software is written to run in the user mode and migrates to the m68040 from any m68000 platform without modification. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
2- 6 m68040 user? manual motorola 31 15 0 31 0 a7 '(isp) 15 7 0 31 15 a7 "(msp) 0 31 2 31 0 sr vbr sfc dfc cacr (ccr) alternate source and destination
function code registers interrupt stack pointer master stack pointer status register vector base register cache control register 0 figure 2-4. integer unit supervisor programming model the supervisor programming model consists of the registers available to the user as well as the following control registers: two 32-bit supervisor stack pointers (isp, msp) 16-bit status register (sr) 32-bit vector base register (vbr) two 32-bit alternate function code registers: source function code (sfc) and destination function code (dfc) 32-bit cache control register (cacr) the following paragraphs describe the supervisor programming model registers. additional information on the isp, msp, sr, and vbr registers can be found in section 8 exception processing. 2.2.2.1 interrupt and master stack pointers. in a multitasking operating system, it is more efficient to have a supervisor stack pointer associated with each user task and a separate stack pointer for interrupt-associated tasks. the m68040 provides two supervisor stack pointers, master and interrupt. explicit references to the ssp refer to either the msp or isp while the processor is operating in the supervisor mode. all instructions that use the ssp implicitly reference the active stack pointer. the isp and msp are general-purpose registers and can be used as software stack pointers, index registers, or base address registers. the isp and msp can be used for word and long- word operations. the m-bit of the sr selects whether the isp or msp is active. ssp references access the isp when the m-bit is clear, putting the processor into the interrupt mode. if an exception being processed is an interrupt and the m-bit is set, the m-bit is cleared, putting the processor into the interrupt mode. the interrupt mode is the default condition after reset, and all ssp references access the isp. the isp can be used for interrupt control information and for workspace area as interrupt exception handling requires. ssp references access the msp when the m-bit is set. the operating system uses the msp for each task pointing to a task-related area of supervisor data space. this f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 2- 7 procedure separates task-related supervisor activity from asynchronous, i/o-related supervisor tasks that can only be coincidental to the currently executing task. the msp can separately maintain task control information for each currently executing user task, and the software updates the msp when a task switch is performed, providing an efficient means for transferring task-related stack items. the value of the m-bit does not affect execution of privileged instructions. instructions that affect the m-bit are move to sr, andi to sr, eori to sr, ori to sr, and rte. the processor automatically saves the m- bit value and clears it in the sr as part of the exception processing for interrupts. 2.2.2.2 status register. the sr (see figure 2-5) stores the processor status. in the supervisor mode, software can access the full sr, including the ccr available in user mode (see 2.2.1.5 condition code register ) and the interrupt priority mask and additional control bits available only in the supervisor mode. these bits indicate the following states for the processor: one of two trace modes (t1, t0), supervisor or user mode (s), and master or interrupt mode (m). the term ssp refers to the isp and msp. the m and s bits of the sr decide which ssp to use. when the s-bit is one and the m-bit is zero, the isp is the active stack pointer; when the s-bit is one and the m-bit is one, the msp is the active stack pointer. the isp is the default mode after reset and corresponds to the mc68000, mc68008, mc68010, and cpu32 supervisor mode. t1 t0 s m 0 i2 i1 i0 x n z v c 000 system byte user byte
(condition code register) trace
enable interrupt
priority mask supervisor/user state master/interrupt state extend negative zero overflow carry 15 14 13 12 11 10 9 8 7 5 6 43210 figure 2-5. status register 2.2.2.3 vector base register. the vbr contains the base address of the exception vector table in memory. the displacement of an exception vector is added to the value in this register to access the vector table. refer to section 8 exception processing for information on exception vectors. 2.2.2.4 alternate function code registers. the alternate function code registers contain 3-bit function codes. function codes can be considered extensions of the 32-bit logical address that optionally provides as many as eight 4-gbyte address spaces. the processor automatically generates function codes to select address spaces for data and programs at the user and supervisor modes. certain instructions use the sfc and dfc registers to specify the function codes for operations. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
2- 8 m68040 user? manual motorola 2.2.2.5 cache control register. the cacr contains two enable bits that allow the instruction and data caches to be independently enabled or disabled. setting an enable bit enables the associated cache without affecting the state of any lines within the cache. a hardware reset clears the cacr, disabling both caches. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user's manual 3- 1 section 3 memory management unit (except mc68ec040 and mc68ec040v) note this section does not apply to the mc68ec040 and mc68ec040v. refer to appendix b mc68ec040 for details. all references to m68040 in this section only, refer to the mc68040, mc68040v, and mc68lc040. the m68040 supports a demand-paged virtual memory environment. demand means that programs request memory accesses through logical addresses, and paged means that memory is divided into blocks of equal size, called page frames. each page frame is divided into pages of the same size. the operating system assigns pages to page frames as they are required to meet the needs of the program. the m68040 memory management includes the following features: independent instruction and data memory management units (mmus) 32-bit logical address translation to 32-bit physical address user-defined 2-bit physical address extension addresses translated in parallel with indexing into data or instruction cache 64-entry four-way set-associative address translation cache (atc) for each mmu (128 total entries) global bit allowing flushes of all nonglobal entries from atcs selectable 4k or 8k page size separate supervisor and user translation tables two independent blocks for each mmu can be defined as transparent (untranslated) three-level translation tables with optional indirection supervisor and write protections history bits automatically maintained in descriptors external translation disable input signal ( mdis ) for emulator support caching mode selected on page basis the mmus completely overlap address translation time with other processing activities when the translation is resident in one of the atcs. atc accesses operate in parallel with f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
3- 2 m68040 user's manual motorola indexing into the on-chip instruction and data caches. the mmu mdis signal dynamically disables address translation for emulation and diagnostic support. figure 3-1 illustrates the mmus contained in the two memory units, one for instructions (supporting instruction prefetches) and one for data (supporting all other accesses). each unit contains an mmu, main cache, and snoop controller. the corresponding mmus contain two transparent translation registers, which identify blocks of memory that can be accessed without translation. the mmus also contain control logic and corresponding address translation caches (atcs) in which recently used logical-to-physical address translations are stored. the data memory unit contains a data write and data read buffer, and the instruction memory unit contains an instruction line read buffer. these buffers temporarily hold data until an opportune moment arises to write the data to external memory or read the operand/instruction into the integer unit. instruction
fetch decode ea
calculate execute write-
back ea
fetch integer
unit convert execute write-
back instruction
atc instruction
mmu/cache/snoop
controller bus
control
signals data
bus address
bus data
atc data
mmu/cache/snoop
controller operand data bus instruction data bus instruction
cache data
cache floating-
point
unit data memory unit instruction memory unit b
u
s
c
o
n
t
r
o
l
l
e
r instruction
address data
address figure 3-1. memory management unit the principal mmu function is to translate logical addresses to physical addresses using translation tables stored in memory. as the mmu receives a logical address from the integer unit, it searches its atc for the corresponding physical address using the upper f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user's manual 3- 3 logical address bits. if the translation is resident, the mmu provides the physical address to the cache controller, which determines if the instruction or data being accessed is cached. the cache controller uses the lower address bits to index into memory. an external bus cycle is performed only when explicitly requested by the cache controller. when the translation is not in the atc, the mmu searches the translation tables in memory for the translation information. microcode and dedicated logic perform the address calculations and bus cycles required for this search. 3.1 memory management programming model the memory management programming model is part of the supervisor programming model for the m68040. the eight registers that control and provide status information for address translation in the m68040 are: the user root pointer register (urp), the supervisor root pointer register (srp), the translation control register (tcr), four independent transparent translation registers (itt0, itt1, dtt0, and dtt1), and the mmu status register (mmusr). only programs that execute in the supervisor mode can directly access these registers. figure 3-2 illustrates the memory management programming model. 31 0 31 0 31 0 0 31 0 31 0 31 0 31 0 urp srp tcr dttr0 ittr0 dttr1 ittr1 mmusr 15 data transparent translation register 0 user root pointer register supervisor root pointer register translation control register instruction transparent translation
register 0 mmu status register instruction transparent translation
register 1 data transparent translation register 1 figure 3-2. memory management programming model 3.1.1 user and supervisor root pointer registers the srp and urp registers each contain the physical address of the translation table? root, which the mmu uses for supervisor and user accesses, respectively. the urp points to the translation table for the current user task. when a new task begins execution, the operating system typically writes a new root pointer to the urp. a new translation table address implies that the contents of the atcs may no longer be valid. a pflush instruction should be executed to flush the atcs before loading a new root pointer value, if necessary. figure 3-3 illustrates the format of the 32-bit urp and srp registers. bits 8 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
3- 4 m68040 user's manual motorola 0 of an address loaded into the urp or the srp must be zero. transfers of data to and from these 32-bit registers are long-word transfers. 31 98 0 user root pointer 000000000 supervisor root pointer 000000000 figure 3-3. urp and srp register formats 3.1.2 translation control register the 16-bit tcr contains two control bits to enable paged address translation and to select page size. the operating system must flush the atcs before enabling address translation since the tcr accesses and reset do not flush the atcs. all unimplemented bits of this register are read as zeros and must always be written as zeros. the m68040 always uses word transfers to access this 16-bit register. the fields of the tcrs are defined following figure 3-4, which illustrates the tcr. 1514131211109876543210 ep00000000000000 note: bits 13? are undefined (reserved). figure 3-4. translation control register format e?nable this bit enables and disables paged address translation. 0 = disable 1 = enable a reset operation clears this bit. when translation is disabled, logical addresses are used as physical addresses. the mmu instruction, pflush, can be executed successfully despite the state of the e-bit. ptest results are undefined if the mmu is disabled and no table search occurs. if translation is disabled and an access does not match a transparent translation register (ttr), the access has the following default attributes on the ttr: the caching mode is cachable/write-through, write protection is disabled, and the user attribute signals (upa1 and upa0) are zero. p?age size this bit selects the memory page size. 0 = 4 kbytes 1 = 8 kbytes a reset operation does not affect this bit. the bit must be initialized after a reset. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user's manual 3- 5 3.1.3 transparent translation registers the data transparent translation registers (dttr0 and dttr1) and instruction transparent translation registers (ittr0 and ittr1) are 32-bit registers that define blocks of logical address space. the ttrs operate independently of the e-bit in the tcr and the state of the mdis signal. data transfers to and from these registers are long-word transfers. the ttr fields are defined following figure 3-5, which illustrates ttr format. bits 12?0, 7, 4, 3, 1, and 0 always read as zero. 31 2423 161514131211109876543210 logical address base logical address mask e s-field 0 0 0 u1 u 0 0 c m 0 0 w 0 0 figure 3-5. transparent translation register format logical address base this 8-bit field is compared with address bits a31?24. addresses that match in this comparison (and are otherwise eligible) are transparently translated. logical address mask since this 8-bit field contains a mask for the logical address mask field, setting a bit in this field causes the corresponding bit in the logical address base field to be ignored. blocks of memory larger than 16 mbytes can be transparently translated by setting some of the logical address mask bits to ones. the low-order bits of this field can be set to define contiguous blocks larger than 16 mbytes. e?nable this bit enables or disables transparent translation of the block defined by this register: 0 = transparent translation disabled 1 = transparent translation enabled s?upervisor mode this field specifies the way fc2 is used in matching an address: 00 = match only if fc2 = 0 (user mode access) 01 = match only if fc2 = 1 (supervisor mode access) 1 x = ignore fc2 when matching u0, u1?ser page attributes the user defines these bits, and the m68040 does not interpret them. u0 and u1 are echoed to the upa0 and upa1 signals, respectively, if an external bus transfer results from an access. these bits can be programmed by the user to support external addressing, bus snooping, or other applications. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
3- 6 m68040 user's manual motorola cm?ache mode this field selects the cache mode and access serialization as follows: 00 = cachable, write-through 01 = cachable, copyback 10 = noncachable, serialized 11 = noncachable section 4 instruction and data caches provides detailed information on caching modes, and section 7 bus operation provides information on serialization. w?rite protect this bit indicates if the transparent block is write protected. if set, write and read-modify- write accesses are aborted as if the resident bit in a table descriptor were clear. 0 = read and write accesses permitted 1 = write accesses not permitted 3.1.4 mmu status register the mmusr is a 32-bit register that contains the status information returned by execution of the ptest instruction. the ptest instruction searches the translation tables to determine status information about the translation of a specified logical address. transfers to and from the mmusr are long-word transfers. the fields of the mmusr are defined following figure 3-6, which illustrates the mmusr. 31 1211109876543210 physical address b g u1 u 0 s c m m o w t r figure 3-6. mmu status register format physical address this 20-bit field contains the upper bits of the translated physical address. merging these bits with the lower bits of the logical address forms the actual physical address. bit 12 is undefined if a ptest is executed with 8-kbyte pages selected. b?us error the b-bit is set if a transfer error is encountered during the table search for the ptest instruction. if the b-bit is set, all other bits are zero. g?lobal this bit is set if the g-bit is set in the page descriptor. u1, u0?ser page attributes these bits are set if corresponding bits in the page descriptor are set. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user's manual 3- 7 s?upervisor protection this bit is set if the s-bit in the page descriptor is set. setting this bit does not indicate that a violation has occurred. cm?ache mode this 2-bit field is copied from the cm bits in the page descriptor. m?odified this bit is set if the m-bit is set in the page descriptor associated with the address. w?rite protect this bit is set if the w-bit is set in any of the descriptors encountered during the table search. setting this bit does not indicate that a violation has occurred. t?ransparent translation register hit if the t-bit is set, then the ptest address matches an instruction or data ttr, the r-bit is set, and all other bits are zero. r?esident the r-bit is set if the ptest address matches an instruction or data ttr or if the table search completes by obtaining a valid page descriptor. 3.2 logical address translation the function of the mmus is to translate logical addresses to physical addresses. the mmus perform translations according to control information in translation tables. the operating system creates these translation tables and stores them in memory. the processor then fetches a translation table as needed and stores it in an atc. 3.2.1 translation tables the m68040 uses the atcs in the instruction and data memory units with translation tables stored in memory to perform the translations from logical to physical addresses. the operating system loads the translation tables for a program into memory. no distinction is made in the translation of instruction accesses versus data accesses because the instruction and data mmus access the same translation table for a specific privilege mode, either user or supervisor. this lack of distinction results in a merged instruction and data address space. figure 3-7 illustrates the three-level tree structure of a general translation table supported by the m68040. the root- and pointer-level tables contain the base addresses of the tables at the next level. the page-level tables contain either the physical address for the translation or a pointer to the memory location containing the physical address. only a portion of the translation table for the entire logical address space is required to be resident in memory at any time?pecifically, only the portion of the table that translates f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
3- 8 m68040 user's manual motorola the logical addresses of the currently executing process. portions of translation tables can be dynamically allocated as the process requires additional memory. root pointer page
tables first
level second
level
third
level pointer
tables root
tables figure 3-7. translation table structure the current privilege mode determines the use of the urp or srp for translation of the access. the root pointer contains the base address of the translation table? root-level table. the translation table consists of tables of descriptors. the table descriptors of the root- and pointer-levels can be either resident or invalid. the page descriptors of the page- level table can be resident, indirect, or invalid. a page descriptor defines the physical address of a page frame in memory that corresponds to the logical address of a page. an indirect descriptor, which contains a pointer to the actual page descriptor, can be used when two or more logical addresses access a single page descriptor. the table search uses logical addresses to access the translation tables. figure 3-8 illustrates a logical address format, which is segmented into four fields: root index (ri), pointer index (pi), page index (pgi), and page offset. the first three fields extracted from the logical address index the base address for each table level. the seven bits of the logical address ri field are multiplied by 4 or shifted to the left by two bits. this sum is concatenated with the upper 23 bits of the appropriate root pointer (urp or srp) to yield the physical address of a root-level table descriptor. each of the 128 root-level table descriptors corresponds to a 32-mbyte block of memory and points to the base of a pointer-level table. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user's manual 3- 9 7 bits 31 25 24 18 17 13 12 11 0 7 bits 8k page
4k page 8k page
4k page root index field
(ri) pointer index field
(pi) page index field
(pgi) page offset figure 3-8. logical address format the seven bits of a logical address pi field are multiplied by 4 (shifted to the left by two bits) and concatenated with the fetched root-level descriptor? upper 23 bits to produce the physical address of the pointer-level table descriptor. each of the 128 pointer-level table descriptors corresponds to a 256-kbyte block of memory. for 8-kbyte pages, the five bits of the pgi field are multiplied by 4 (shifted to the left by two bits) and concatenated with the fetched pointer-level descriptor? upper 25 bits to produce the physical address of the 8-kbyte page descriptor. the upper 19 bits of the page descriptor are the page frame? physical address. there are 32 8-kbyte page descriptors in a page-level table. similarly, for 4-kbyte pages, the six bits of the pgi field are multiplied by 4 (shifted to the left by two bits) and concatenated with the fetched pointer-level descriptor? upper 24 bits to produce the physical address of the 4-kbyte page descriptor. the upper 20 bits of the page descriptor are the page frame? physical address. there are 64 4-kbyte page descriptors in a page-level table. write-protect status is accumulated from each level? descriptor and combined with the status from the page descriptor to form the atc entry status. the m68040 creates the atc entry from the page frame address and the associated status bits and retries the original bus access. refer to 3.3 address translation caches for details on atc entries. if the descriptor from a page table is an indirect descriptor, the page descriptor pointed to by this descriptor is fetched. invalid descriptors can be used at any level of the tree except the root. when a table search for a normal translation encounters an invalid descriptor, the processor takes an access fault exception. the invalid descriptor can be used to identify either a page or branch of the tree that has been stored on an external device and is not resident in memory or a portion of the translation table that has not yet been defined. in these two cases, the exception routine can either restore the page from disk or add to the translation table. figures 3-9 and 3-10 illustrate detailed flowcharts of table search and descriptor fetch operations. a table search terminates successfully when a page descriptor is encountered. the occurrence of an invalid descriptor or a transfer error acknowledge also terminates a table search, and the m68040 takes an exception on the retry of the cycle because of these conditions. the exception handler should distinguish between anticipated conditions and true error conditions. the exception handler can correct an invalid descriptor that indicates a nonresident page or one that identifies a portion of the translation table yet to be allocated. an access error due to a system malfunction can require the exception handler to write an error message and terminate the task. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
3- 10 m68040 user's manual motorola entry select root pointer fc2 = 0:urp, 1:srp fetch root
descriptor fetch pointer
descriptor fetch page
descriptor type 'indirect' fetch indirect
descriptor exit table search create atc entry with r-bit clear exit table search (initialize accrued
status) create atc entry with r-bit set atc entry pfa, df[u1,u0,s,cm,m],wp atc tag fc2, la, df[g] abbreviations: pfa - page frame address df[ ] - descriptor field wp - accumulated write- protection status assignment operator ? wp 0 type 'pointer' update false ? ? ? ? ? ? type 'page' 'resident' 'invalid' 'resident' 'invalid' 'resident' 'resident' otherwise 'indirect' pfa = physical address field of descriptor 'invalid' (check descriptor type) (check descriptor type) ? (check descriptor type) (check descriptor type) figure 3-9. detailed flowchart of table search operation f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user's manual 3- 11 create atc entry with r-bit clear fetch descriptor & update history and status fetch descriptor at pa = ta + (index*4) fetch descriptor at pa = descriptor address wp = wp v w or 'indirect' execute write access u 1, m 1 execute locked rmw access u 1 u = 1 & u = 0 & schedule write access
u 1 (see note) wp = wp v w if scheduled, execute write access (u 1) for previous descriptor create atc entry with r-bit clear ? ? ? ? wp e accumulated write- protection status v e logical "or" operator e assignment operator due to access pipelining, a pointer descriptor write access to update the u-bit occurs after the read of the next level descriptor. note : abbreviations: ? return return return return exit table search exit table search type = 'page' or 'pointer' otherwise type = 'page' 'invalid' 'resident' 'resident' u = 1 u = 0 'invalid' type = 'pointer' read access u = 0 (wp = 1 or m = 1) (wp = 1 or m = 1) wp = 0 & m = 0 u = 1 write access otherwise ? type = 'indirect' (index = ri, pi, or pgi) (see note) normal termination of all bus transfers normal termination of all bus transfers or 'indirect' figure 3-10. detailed flowchart of descriptor fetch operation f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
3- 12 m68040 user's manual motorola motorola highly recommends that the translation tables be placed in cache-inhibited memory space. motorola also highly recommends table descriptors must not be left in states that are incoherent to the processor. future processors may treat these recommendations as mandatory. the following paragraphs apply only to m68040 systems that cannot meet these recommendations. the processor never allocates table descriptors in the data cache when the processor performs a table search. only normal accesses to the translation tables cause descriptors to be allocated in the data cache. if table descriptors are allocated in the data cache and the cache is disabled, the processor locks up trying to access a cached descriptor during a table search. ensuring that the data cache is invalidated before enabling the mmu or disabling the data cache and ensuring that the pages containing table descriptors are pushed and invalidated prevents lockup during table searches. table and page descriptors must not be left in a state that is incoherent to the processor. violation of this restriction can result in an undefined operation. page descriptors must not have an encoding of u-bit = 0, m-bit = 1 and pdt field = 01 or 11. this encoding indicates that the page descriptor is resident, not used, and modified. the processor? table search algorithm never leaves a descriptor in this state. this state is possible through direct manipulation by the operating system for this specific instance. a table search for a move16 write can corrupt the cache line being written if the table descriptors are marked copyback. 3.2.2 descriptors there are two types of descriptors used in the translation tables, table and page. table- and page-level descriptors can be further divided into types of descriptors. root table descriptors are used in root-level tables and pointer table descriptors are used in pointer- level tables. descriptors in the page-level tables contain either a page descriptor for the translation or an indirect descriptor that points to a memory location containing the page descriptor. the p-bit in the tcr selects the page size as either 4 or 8 kbytes. 3.2.2.1 table descriptors . figure 3-11 illustrates the formats of the root and pointer table descriptors. two descriptor formats are possible at the pointer-level tables to support 4-kbyte and 8-kbyte page sizes. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user's manual 3- 13 31 9876543210 pointer table address xxxxxuw udt root table descriptor (root level) 31 876543210 page table address xxxxuw udt 4k pointer table descriptor (pointer level) 31 76543210 page table address x x x u w udt 8k pointer table descriptor (pointer level) figure 3-11. table descriptor formats 3.2.2.2 page descriptors. figure 3-12 illustrates the page descriptors for both 4-kbyte and 8-kbyte page sizes. refer to section 4 instruction and data caches for details concerning caching page descriptors. 31 1211109876543210 physical address ur g u1 u0 s cm m u w pdt 4k page desciptor (page level) 31 131211109876543210 physical address ur ur g u1 u0 s cm m u w pdt 8k page descriptor (page level) 31 210 descriptor address pdt indirect page descriptor (page level) figure 3-12. page descriptor formats 3.2.2.3 descriptor field definitions. the field definitions for the table- and page- level descriptors are listed in alphabetical order: cm?ache mode this field selects the cache mode and accesses serialization as follows: 00 = cachable, write-through 01 = cachable, copyback 10 = noncachable, serialized 11 = noncachable section 4 instruction and data caches provides detailed information on caching modes, and section 7 bus operation provides information on serialization. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
3- 14 m68040 user's manual motorola descriptor address this 30-bit field, which contains the physical address of a page descriptor, is only used in indirect descriptors. g?lobal when this bit is set, it indicates the entry is global. pflush instruction variants that specify nonglobal entries do not invalidate global entries, even when all other selection criteria are satisfied. if these pflush variants are not used, then system software can use this bit. m?odified this bit identifies a modified page. the m68040 sets the m-bit in the corresponding page descriptor before a write operation to a page for which the m-bit is clear, except for write-protect or supervisor violations. the read portion of a read-modify-write access is considered a write for updating purposes. the m68040 never clears this bit. pdt?age descriptor type this field identifies the descriptor as an invalid descriptor, a page descriptor for a resident page, or an indirect pointer to another page descriptor. 00 = invalid this code indicates that the descriptor is invalid. an invalid descriptor can represent a nonresident page or a logical address range that is out of bounds. all other bits in the descriptor are ignored. when an invalid descriptor is encountered, an atc entry is created for the logical address with the resident bit in the mmusr clear. 01 or 11 = resident these codes indicate that the page is resident. 10 = indirect this code indicates that the descriptor is an indirect descriptor. bits 31? contain the physical address of the page descriptor. this encoding is invalid for a page descriptor pointed to by an indirect descriptor. physical address this 20-bit field contains the physical base address of a page in memory. the logical address supplies the low-order bits of the address required to index into the page. when the page size is 8-kbyte, the least significant bit of this field is not used. s?upervisor protected this bit identifies a page as supervisor only. only programs operating in the supervisor mode are allowed to access the portion of the logical address space mapped by this descriptor when the s-bit is set. if the bit is clear, both supervisor and user accesses are allowed. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user's manual 3- 15 page table address this field contains the physical base address of a table of page descriptors. the low- order bits of the address required to index into the page table are supplied by the logical address. u?sed the processor automatically sets this bit when a descriptor is accessed in which the u-bit is clear. in a page descriptor table, this bit is set to indicate that the page corresponding to the descriptor has been accessed. in a pointer table, this bit is set to indicate that the pointer has been accessed by the m68040 as part of a table search. the u-bit is updated before the m68040 allows a page to be accessed. the processor never clears this bit. u0, u1?ser page attributes these bits are user defined and the processor does not interpret them. u0 and u1 are echoed to the upa0 and upa1 signals, respectively, if an external bus transfer results from the access. applications for these bits include extended addressing and snoop protocol selection. udt?pper level descriptor type these bits indicate whether the next level table descriptor is resident. 00 or 01 = invalid these codes indicate that the table at the next level is not resident or that the logical address is out of bounds. all other bits in the descriptor are ignored. when an invalid descriptor is encountered, an atc entry is created for the logical address with the resident bit in the mmusr clear. 10 or 11 = resident these codes indicate that the page is resident. ur?ser reserved these single bit fields are reserved for use by the user. w?rite protected setting the w-bit in a table descriptor write protects all pages accessed with that descriptor. when the w-bit is set, a write access or a read-modify-write access to the logical address corresponding to this entry causes an access error exception to be taken. x?otorola reserved these bit fields are reserved for future use by motorola. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
3- 16 m68040 user's manual motorola 3.2.3 translation table example figure 3-13 illustrates an access example to the logical address $76543210 while in the supervisor mode with an 8-kbyte memory page size. the ri field of the logical address, $3b, is mapped into bits 8? of the srp value to select a 32-bit root table descriptor at a root-level table. the selected root table descriptor points to the base of a pointer-level table, and the pi field of the logical address, $15, is mapped into bits 8? of this base address to select a pointer descriptor within the table. this pointer table descriptor points to the base of a page-level table, and the pgi field of the logical address, $1, is mapped into bits 6? of this base address to select a page descriptor within the table. 3.2.4 variations in translation table structure several aspects of the mmu translation table structure are software configurable, allowing the system designer flexibility to optimize the performance of the mmus for a particular system. the following paragraphs discuss the variations of the translation table structure. 3.2.4.1 indirect action. the m68040 provides the ability to replace an entry in a page table with a pointer to an alternate entry. the indirection capability allows multiple tasks to share a physical page while maintaining only a single set of history information for the page (i.e., the modified indication is maintained only in the single descriptor). the indirection capability also allows the page frame to appear at arbitrarily different addresses in the logical address spaces of each task. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user's manual 3- 17 root level
tables pointer level
tables page level
tables 0111011001010100001xx xxxxxxxxxxx root index pointer index page index page offset logical address $76543210 = $3b $15 $01 table entry # = table $3b table $15 supervisor
mode table $00 table $7f table $1f table $00 table $00 $3b $ec $54 $04 address offset = $00001800 $00003000
frame address
srp $15 $01 figure 3-13. example translation table using the indirection capability, single entries or entire tables can be shared between multiple tasks. figure 3-14 illustrates two tasks sharing a page using indirect descriptors. when the m68040 has completed a normal table search, it examines the pdt field of the last entry fetched from the page tables. if the pdt field contains an indirect ($2) encoding, it indicates that the address contained in the highest order 30 bits of the descriptor is a pointer to the page descriptor that is to be used to map the logical address. the processor then fetches the page descriptor from this address and uses the physical address field of the page descriptor as the physical mapping for the logical address. the page descriptor located at the address given by the indirect descriptor must not have a pdt field with an indirect encoding (it must be either a resident descriptor or invalid). otherwise, the descriptor is treated as invalid, and the m68040 creates an atc entry with a signaled error condition (r-bit in mmusr is clear). f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
3- 18 m68040 user's manual motorola root-level
tables pointer-level
tables page-level
tables 0111011001010100001xx xxxxxxxxxxx root index pointer index page index page offset logical address $76543210 = $3b $15 $01 table entry # = table $3b table $15 table $00 table $7f table $1f table $00 table $00 $3b $ec $54 $04 address offset = $00001800 $00003000
$80000010 $15 $01 root pointer task a task b root pointer frame address
figure 3-14. translation table using indirect descriptors 3.2.4.2 table sharing between tasks. more than one task can share a pointer- or page-level table by placing a pointer to a shared table in the address translation tables. the upper (nonshared) tables can contain different write-protected settings, allowing different tasks to use the memory areas with different write permissions. in figure 3-15, two tasks share the memory translated by the table at the pointer table level. task a cannot write to the shared area; task b, however, has the w-bit clear in its pointer to the shared table so that it can read and write the shared area. also, the shared area appears at different logical addresses for each task. figure 3-15 illustrates shared tables in a translation table structure. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user's manual 3- 19 root-level
tables pointer-level
tables page-level
tables 0111011001010100001xx xxxxxxxxxxx root index pointer index page index page offset logical address $76543210 = $3b $15 $01 table entry # = table $3b table $15 table $00 table $00 table $00 $3b $ec $54 $04 address offset = frame address*
root pointer $15 $01 task a task b root pointer w-bit clear * page frame address shared by task a and b; write protected from task a. w-bit set $00003000
figure 3-15. translation table using shared tables 3.2.4.3 table paging. the entire translation table for an active task need not be resident in main memory. in the same way that only the working set of pages must be allocated in main memory, only the tables that describe the resident set of pages need be available. placing the invalid code ($0 or $1) in the udt field of the table descriptor that points to the absent table(s) implements this paging of tables. when a task attempts to use an address that an absent table would translate, the m68040 is unable to locate a translation and takes access error exception when the execution unit retries the bus access that caused the table search to be initiated. the operating system determines that the invalid code in the descriptor corresponds to nonresident tables. this determination can be facilitated by using he unused bits in the descriptor to store status information concerning the invalid encoding. the m68040 does not interpret or modify an invalid descriptor? fields except for the udt field. this f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
3- 20 m68040 user's manual motorola interpretation allows the operating system to store system-defined information in the remaining bits. information typically stored includes the reason for the invalid encoding (tables paged out, region unallocated, etc.) and possibly the disk address for nonresident tables. figure 3-16 illustrates an address translation table in which only a single page table (table $15) is resident; all other page tables are not resident. $15 $01 table $3b $3b supervisor
table $00 table $00 udt = invalid udt = invalid udt = invalid udt = invalid udt = resident pointer-level
tables page-level
tables table $7f table $15 table $00 table $1f udt = invalid udt = invalid udt = invalid udt = invalid udt = resident srp 0111011001010100001xx xxxxxxxxxxx root index pointer index page index page offset logical address $76543210 = $3b $15 $01 table entry # = $ec $54 $04 address offset = frame address nonresident (paged or unallocated) root-level
tables nonresident (paged or unallocated) nonresident (paged or unallocated) nonresident (paged or unallocated) nonresident (paged or unallocated) nonresident (paged or unallocated) figure 3-16. translation table with nonresident tables f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user's manual 3- 21 3.2.4.4 dynamically allocated tables. similar to paged tables, a complete translation table need not exist for an active task. the operating system can dynamically allocate the translation table based on requests for access to particular areas. as in demand paging, it is difficult, if not impossible, to predict the areas of memory that a task uses over any extended period. instead of attempting to predict the requirements of the task, the operating system performs no action for a task until a demand is made requesting access to a previously unused area or an area that is no longer resident in memory. this technique can be used to efficiently create a translation table for a task. for example, consider an operating system that is preparing the system to execute a previously unexecuted task that has no translation table. rather than guessing what the memory-usage requirements of the task are, the operating system creates a translation table for the task that maps one page corresponding to the initial value of the program counter (pc) for that task and one page corresponding to the initial stack pointer of the task. all other branches of the translation table for this task remain unallocated until the task requests access to the areas mapped by these branches. this technique allows the operating system to construct a minimal translation table for each task, conserving physical memory utilization and minimizing operating system overhead. 3.2.5 table search accesses the cache treats table search accesses that are not read-modify-write accesses as cachable/write-through but do not allocate in the cache for misses. read-modify-write table search accesses (required to update some descriptor u-bit and m-bit combinations) are treated as noncachable and force a matching cache line to be pushed and invalidated. table search bus accesses are locked only for the specific portions of the table search that requires a read-modify-write access. during a table search, the u-bit in each encountered descriptor is checked and set if not already set. similarly, when the table search is for a write access and the m-bit of the page descriptor is clear, the processor sets the bit if the table search does not encounter a set w-bit or a supervisor violation. repeating the descriptor access as part of a read- modify-write access updates specific combinations of the u and m bits, allowing the external arbiter to prevent the update operation from being interrupted. the m68040 asserts the lock signal during certain portions of the table search to ensure proper maintenance of the u-bit and m-bit. the u-bit and m-bit are updated before the m68040 allows a page to be accessed or written. as descriptors are fetched, the u-bit and m-bit are monitored. write cycles modify these bits when required. for a table descriptor, a write cycle that sets the u-bit occurs only if the u-bit was clear. table 3-1 lists the page descriptor update operations for each combination of u-bit, m-bit, write-protected, and read or write access type. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
3- 22 m68040 user's manual motorola table 3-1. updating u-bit and m-bit for page descriptors previous status access page descriptor new status u-bit m-bit wp bit type update operation u-bit m-bit 0 0 locked rmw access to set u 1 0 0 1 locked rmw access to set u 1 1 1 0 x read none 1 0 1 1 none 1 1 0 0 write to set u and m 1 1 0 1 locked rmw access to set u 1 1 1 0 0 write to set m 1 1 1 1 write none 1 1 0 0 locked rmw access to set u 1 0 0 1 locked rmw access to set u 1 1 1 0 1 none 1 0 1 1 none 1 1 note: wp indicates the accumulated write-protect status. an alternate address space access is a special case that is immediately used as a physical address without translation. because the m68040 implements a merged instruction and data space, the integer unit translates moves accesses to instruction address spaces (sfc/dfc = $6 or $2) into data references (sfc/dfc = $5 or $1). the data memory unit handles these translated accesses as normal data accesses. if the access fails due to an atc fault or a physical bus error, the resulting access error stack frame contains the converted function code in the tm field for the faulted access. invalidation of the instruction cache line containing the referenced location to maintain cache coherency must precede moves accesses that write the instruction address space. the sfc and dfc values and results are listed in table 3-2. table 3-2. sfc and dfc values results sfc/dfc value tt tm 000 10 000 001 00 001 010 00 001 011 10 011 100 10 100 101 00 101 110 00 101 111 10 111 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user's manual 3- 23 3.2.6 address translation protection the m68040 mmus provide separate translation tables for supervisor and user address spaces. the translation tables contain both mapping and protection information. each table and page descriptor includes a write-protect (w) bit that can be set to provide write protection at any level. page descriptors also contain a supervisor-only (s) bit that can limit access to programs operating at the supervisor privilege level. the protection mechanisms can be used individually or in any combination to protect: supervisor address space from accesses by user programs. user address space from accesses by other user programs. supervisor and user program spaces from write accesses (implicitly supported by designating all memory pages used for program storage as write protected). one or more pages of memory from write accesses. 3.2.6.1 supervisor and user translation tables. one way of protecting supervisor and user address spaces from unauthorized accesses is to use separate supervisor and user translation tables. separate trees protect supervisor programs and data from accesses by user programs and user programs and data from access by supervisor programs. access is granted to the supervisor programs that can accesses any area of memory with moves. the translation table pointed to by the srp is selected for all other supervisor mode accesses. this translation table can be common to all tasks. figure 3-17 illustrates separate translation tables for supervisor accesses and for two user tasks that share the common supervisor space. each user task has an translation table with unique mappings for the logical addresses in its user address space. 3.2.6.2 supervisor only. a second mechanism protects supervisor programs and data without requiring segmenting of the logical address space into supervisor and user address spaces. page descriptors contain s-bits to protect areas of memory from access by user programs. when a table search for a user access encounters an s-bit set in a page descriptor, the table search ends, and an atc descriptor corresponding to the logical address is created with the s-bit set. a subsequent retry of the user access results in an access error exception being taken. the s-bit can be used to protect one or more pages from user program access. supervisor and user mode accesses can share descriptors by using indirect descriptors or by sharing tables. the entire user and supervisor address spaces can be mapped together by loading the same root pointer address into both the srp and urp registers. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
3- 24 m68040 user's manual motorola for task 'a' urp for task 'a' user a level table translation table for task 'a' for task 'b' urp for task 'b' user a level table translation table for task 'b' pointer common srp supervisor a level table translation table for all supervisor accesses figure 3-17. translation table structure for two tasks 3.2.6.3 write protect. the m68040 provides write protection independent of other protection mechanisms. all table and page descriptors contain w-bits to protect areas of memory from write accesses of any kind, including supervisor writes. an atc descriptor corresponding to the logical address is created with the w-bit set after the table search is completed when a table search encounters a w-bit set in any table or page descriptor. the subsequent retry of the write access results in an access error exception being taken. the w-bit can be used to protect the entire area of memory defined by a branch of the translation table or protect only one or more pages from write accesses. figure 3-18 illustrates a memory map of the logical address space organized to use supervisor-only and write-protect bits for protection. figure 3-19 illustrates an example translation table for this technique. supervisor and user space this area is supervisor only, read-only this area is supervisor only, read/write this area is supervisor or user, read-only this area is supervisor or user, read/write figure 3-18. logical address map with shared supervisor and user address spaces f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user's manual 3- 25 privilege
mode srp urp urp & srp point to same a level table w =1 w = 1 w = 0 w = 0 s = 1,w = x this page supervisor only, read only w = 0 s = 1,w = 0 pointer-level
table page-level
table w = 0 s = 0,w = 0 this page supervisor only, read/write this page supervisor/user, read only this page supervisor/user, read/write w = x s = 0,w = x note: x = don? care. w = x root-level
table figure 3-19. translation table using s-bit and w-bit to set protection f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
3- 26 m68040 user's manual motorola 3.3 address translation caches the atcs in the mmus are four-way set-associative caches that each store 64 logical-to- physical address translations and associated page information similar in form to the corresponding page descriptors in memory. the purpose of the atc is to provide a fast mechanism for address translation by avoiding the overhead associated with a table search of the logical-to-physical mapping of recently used logical addresses. figure 3-20 illustrates the organization of the atc. 3 page frame page offset mux mux mux 2 1 comparator
0 status pa(31?3) pa(11?) pa(12) page size page size 1 16 3 1 12 1 17 29 19 9 1 4 17 0 12 16 31 hit 3 hit 2 hit 1 hit 0 hit hit
detect line select tag entry 29 f
c
set 0 set 1 set 15 tag entry ? ? ? ? set
select 2 figure 3-20. atc organization f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user's manual 3- 27 each atc entry consists of a physical address, attribute information from a corresponding page descriptor, and a tag that contains a logical address and status information. figure 3-21, which illustrates the entry and tag fields, is followed by field definitions listed in alphabetical order. u1 u0 s cm m w r physical address* entry v g fc2 logical address* tag * for 4-kbyte page sizes this field uses address bits 31?2; for 8-kbyte page sizes, bits 31?3. figure 3-21. atc entry and tag fields cm?ache mode this field selects the cache mode and accesses serialization as follows: 00 = cachable, write-through 01 = cachable, copyback 10 = noncachable, serialized 11 = noncachable section 4 instruction and data caches provides detailed information on caching modes, and section 7 bus operation provides information on serialization. fc2?unction code bit 2 (supervisor/user) this bit contains the function code corresponding to the logical address in this entry. fc2 is set for supervisor mode accesses and cleared for user mode accesses. g?lobal when set, this bit indicates the entry is global. global entries are not invalidated by the pflush instruction variants that specify nonglobal entries, even when all other selection criteria are satisfied. logical address this 13-bit field contains the most significant logical address bits for this entry. all 16 bits of this field are used in the comparison of this entry to an incoming logical address when the page size is 4 kbytes. for 8-kbytes pages, the least significant bit of this field is ignored. m?odified the modified bit is set when a valid write access to the logical address corresponding to the entry occurs. if the m-bit is clear and a write access to this logical address is attempted, the m68040 suspends the access, initiates a table search to set the m-bit in the page descriptor, and writes over the old atc entry with the current page descriptor information. the mmu then allows the original write access to be performed. this f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
3- 28 m68040 user's manual motorola procedure ensures that the first write operation to a page sets the m-bit in both the atc and the page descriptor in the translation tables, even when a previous read operation to the page had created an entry for that page in the atc with the m-bit clear. physical address the upper bits of the translated physical address are contained in this field. r?esident this bit is set if the table search successfully completes without encountering either a nonresident page or a transfer error acknowledge during the search. s?upervisor protected this bit identifies a pointer table or a page as a supervisor-only table or page. only programs operating in the supervisor privilege mode are allowed to access the portion of the logical address space mapped by this descriptor when the s-bit is set. if the bit is clear, both supervisor and user accesses are allowed. u0, u1?ser page attributes these user-defined bits are not interpreted by the m68040. u0 and u1 are echoed to the upa0 and upa1 signals, respectively, if an external bus transfer results from the access. v?alid when set, this bit indicates the validity of the entry. this bit is set when the m68040 loads an entry. a flush operation by a pflush or pflusha instruction that selects this entry clears the bit. w?rite protected this write-protect bit is set when a w-bit is set in any of the descriptors encountered during the table search for this entry. setting a w-bit in a table descriptor write protects all pages accessed with that descriptor. when the w-bit is set, a write access or a read- modify-write access to the logical address corresponding to this entry causes an access error exception to be taken immediately. for each access to a memory unit, the mmu uses the four bits of the logical address located just above the page offset (la16?a13 for 8k pages, la15?a12 for 4k pages) to index into the atc. the tags are compared with the remaining upper bits of the logical address and fc2. if one of the tags matches and is valid, then the multiplexer choses the corresponding entry to produce the physical address and status information. the atc outputs the corresponding physical address to the cache controller, which accesses the data within the cache and/or requests an external bus cycle. each atc entry contains a logical address, a physical address, and status bits. when the atc does not contain the translation for a logical address, a miss occurs. the mmu aborts the current access and searches the translation tables in memory for the correct translation. if the table search completes without any errors, the mmu stores the f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user's manual 3- 29 translation in the atc and provides the physical address for the access, allowing the memory unit to retry the original access. there are some variations in the logical-to-physical mapping because of the two page sizes. if the page size is 4 kbytes, then logical address bit 12 is used to access the atc's memory, the tag comparators use bit 16, and physical address bit 12 is an atc output. if the page size is 8 kbytes, then logical address bit 16 is used to access the atc's memory, and physical address bit 12 is driven by logical address bit 12. it is advisable that a translation always be disabled before changing size and that the atcs are flushed before enabling translation again. the m68040 is organized such that other operations always completely overlap the translation time of the atcs; thus, no performance penalty is associated with atc searches. the address translation occurs in parallel with indexing into the on-chip instruction and data caches. the mmu replaces an invalid entry when the atc stores a new address translation. when all entries in an atc set are valid, the atc selects a valid entry to be replaced, using a pseudo-random replacement algorithm. a 2-bit counter, which is incremented for each atc access, points to the entry to replace when an access misses in the atc. atc hit rates are application and page-size dependent, but hit rates ranging from 98% to greater than 99% can be expected. these high rates are achieved because the atcs are relatively large (64 entries) and utilization efficiency is high with 8-kbyte and 4-kbyte page sizes. 3.4 transparent translation four independent ttrs (dtt0 and dtt1 in the data mmu, itt0 and itt1 in the instruction mmu) define four blocks of logical address space to be translated to physical address space. these logical address spaces must be at least 16 mbytes and can overlap or be separate. each ttr can be disabled and completely ignored. the following description assumes that the ttrs are enabled. when an mmu receives an address to be translated, the privilege mode and the eight high-order bits of the address are compared to the logical address spaces defined by the two ttrs for the corresponding mmu. the logical address space for each ttr is defined by an s-field, logical base address field, and logical address mask field. the s-field allows matching either user or supervisor accesses or both accesses. when a bit in the logical address mask field is set, the corresponding bit of the logical base address is ignored in the address comparison and privilege mode. setting successively higher order bits in the address mask increases the size of the physical address space. the address for the current bus cycle and a ttr address match when the privilege mode and logical base address bits are equal. each ttr can specify write protection for the block. when write protection is enabled for a block, write or read-modify-write accesses to the block are aborted. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
3- 30 m68040 user's manual motorola by appropriately configuring a ttr, flexible transparent mappings can be specified (refer to 3.1.3 transparent translation registers for field identification). for instance, to transparently translate the user address space, the s-field is set to $0, and the logical address mask is set to $ff in both an instruction and data ttr. to transparently translate supervisor accesses of addresses $00000000?0fffffff with write protection, the logical base address field is set to $0x, the logical address mask is set to $0f, the w-bit is set to one, and the s-field is set to $1. the inclusion of independent ttrs in both the instruction and data mmus provides an exception to the merged instruction and data address space, allowing different translations for instruction and operand accesses. also, since the instruction memory unit is only used for instruction prefetches, different instruction and data ttrs can cause pc relative operand fetches to be translated differently from instruction prefetches. if either of the ttrs matched during an access to a memory unit (either instruction or data), the access is transparently translated. if both registers match, the tt0 status bits are used for the access. transparent translation can also be implemented by the translation tables of the translation tables if the physical addresses of pages are set equal to their logical addresses. 3.5 address translation summary the instruction and data mmus process translations by first comparing the logical address and privilege mode with the parameters of the ttrs. if there is a match, the mmu uses the logical address as a physical address for the access. if there is no match, the mmu compares the logical address and privilege mode with the tag portions of the entries in the atc and uses the corresponding physical address for the access when a match occurs. when neither a ttr nor a valid atc entry matches, the mmu initiates a table search operation to obtain the corresponding physical address from the translation table. when a table search is required, the processor suspends instruction execution activity and, at the end of a successful table search, stores the address mapping in the appropriate atc and retries the access. the mmu creates a valid atc entry for the logical address, and the access is retried. if an access hits in the atc but an access error or invalid page descriptor was detected during the table search that created the atc entry, the access is aborted, and a bus error exception is taken. if a write or read-modify-write access results in an atc hit but the page is write protected, the access is aborted, and an access error exception is taken. if the page is not write protected and the modified bit of the atc entry is clear, a table search proceeds to set the modified bit in both the page descriptor in memory and in the atc; the access is retried. the atc provides the address translation for the access if the modified bit of the atc entry is set for a write or read-modify-write access to an unprotected page, if the resident bit is set (indicating the table search for the entry completed successfully), and if none of the ttrs (instruction or data, as appropriate) match. an atc access error is not reported immediately, if the last 16 bits of a page is either an a-line, illegal, chk, or unimplemented instruction and the next page is non-resident. instead, the m68040 attempts to prefetch the next instruction on the missing page, then the atc access error exception is reported. the stacked pc points to the exceptional f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user's manual 3- 31 instruction, and the stacked fa points to the first longword in the missing page. when an atc access error occurs while prefetching the next instruction on the non-existant page after a change of flow instruction, the exception should be cleared by execution of the new instruction flow. either avoid this scenario, or have a dummy resident page following the exceptional instruction. figure 3-22 illustrates a general flowchart for address translation. the top branch of the flowchart applies to transparent translation. the bottom three branches apply to atc translation. 3.6 mmu effect on rsti and mdis the following paragraphs describe mmu effects on the rsti and mdis pins. 3.6.1 effect of rsti on the mmus when the m68040 is reset by the assertion of the reset input signal, the e-bits of the tcr and ttrs are cleared, disabling address translation. this reset causes logical addresses to be passed through as physical addresses, allowing an operating system to set up the translation tables and mmu registers as required. after the translation tables and registers are initialized, the e-bit of the tcr can be set, enabling paged address translation. while address translation is disabled, the attribute bits for an access that an atc entry or a ttr normally supplies are zero, selecting write-through cachable mode, no write protection, and user page attribute bits cleared. rsti does not affect the p-bit of the tcr. a reset of the processor does not invalidate any entries in the atcs or alter the page size. a pflush instruction must be executed to flush all existing valid entries from the atcs after a reset operation and before translation is enabled. pflush can be executed even if the e-bit is cleared. 3.6.2 effect of mdis on address translation the assertion of mdis prevents the mmus from performing atc searches and the execution unit from performing table searches. with address translation disabled, logical addresses are used as physical addresses. mdis disables the mmus on the next internal access boundary when asserted and enables the mmus on the next boundary after the signal is negated. the assertion of this signal does not affect the operation of the transparent translation registers or execution of the pflush or ptest instructions. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
3- 32 m68040 user's manual motorola entry exit otherwise take access error
exception abort cycle atc hit otherwise abort cycle table search
operation take access error
exception exit otherwise pa atc entry [pa]
upa atc entry [u1,u0]
cm atc entry [cm] ? ? ? pa logical address
upa ttr1* [u1,u0]
cm ttr1* [cm] ? ? ? exit pa logical address
upa ttr0* [u1,u0]
cm ttr0* [cm] ? ? ? abort cycle (ttr1*[w] = 1) and
(write or rmw
access) (ttr0*[w] = 1) and
(write or rmw
access) logical address
matches with ttr0* otherwise atc miss (r = 0) or
[(w = 1) and
(write or rmw cycle)] logical address
matches with
ttrx* otherwise otherwise (m = 0) and
(write or rmw cycle) * refers to either instruction or data transparent translation register. figure 3-22. address translation flowchart f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user's manual 3- 33 3.7 mmu instructions the m68040 instruction set includes three privileged instructions that perform mmu operations. the following paragraphs briefly describe each of these instructions. for detailed descriptions of these instructions, refer to m68000pr/ad, m68000 family programmer's reference manual . 3.7.1 movec the movec instruction transfers data between an integer data register, or memory location, and any of the m68040 control and status registers. the operating system uses the movec instruction to control and monitor mmu operation by manipulating and reading the eight mmu registers. 3.7.2 pflush the pflush instruction flushes or invalidates address translation descriptors in the atcs. pflusha, a version of the pflush instruction, flushes all entries. the pflush instruction flushes a user or supervisor entry with a specified logical address. the pflushan and pflushn instruction variants qualify entry selection further by flushing only entries that are nonglobal, indicated by a cleared g-bit in the entry. 3.7.3 ptest the ptest instruction performs a table search operation for a specified function code and logical address and sets the appropriate bit fields in the mmusr to indicate conditions encountered during the search. ptest automatically flushes the corresponding entry from the cache before searching the tables and loads the latest information from the translation tables into the atc. the exception routines of the operating system can use this instruction to identify mmu faults. ptest is primarily used in access error exception handlers. for example, if a bus error has occurred, the handler can execute an instruction sequence such as the following sequence: move.b (a7,offset1),d0 copy transfer modifier field from stack frame movec d0,dfc into dfc register movea.l (a7,offset2),a0 copy fault address from stack frame into address register ptestw (a0) test address in a0 with function code in dfc registers the transfer modifier field copied into the destination function code (dfc) register indicates whether the faulted access was a supervisor or user mode access and whether it was an instruction prefetch or data access. the ptest instruction uses the dfc value to determine which translation table (supervisor or user) to search and which atc (data or instruction) to create the entry in. after executing this code sequence, the handler can examine the mmusr for the source of the fault. the m68040 mmu instructions use opcodes that are different from those for the corresponding instructions in the mc68030 and mc68851. all mmu opcodes for the f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
3- 34 m68040 user's manual motorola mc68030 and mc68851 cause f-line unimplemented instruction exceptions if executed in either supervisor or user mode by the m68040. 3.7.4 register programming considerations if the entries in the atcs are no longer valid when a reset operation occurs (as is normally expected), an explicit flush operation must be specified by the system software. the assertion of rsti disables translation by clearing the e-bits of the tcr, dttrx, and ittrx, but it does not flush the atcs. reading or writing any of the mmu registers (urp, srp, tcr, mmusr, dttr0, dttr1, ittr0, ittr1) does not flush the atcs. since a write to these registers can cause some or all the address translations to change, the write should be followed by a pflush operation to flush the atcs if necessary. the status bits in the mmusr indicate conditions to which the operating system should respond. in a typical access error exception handler, the flowchart illustrated in figure 3-23 can be used to determine the cause of an mmu fault. the ptest instruction sets the bits in the mmusr appropriately, and the program can branch to the appropriate code segment for the condition. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user's manual 3- 35 branch to "supervisor
voilation" code branch to "write
violation" code branch to "write
violation" code ptest (an) branch to "page fault" or
"invalid descriptor" code branch to "bus error
during table search" code not mmu r = 0 b = 0 b = 1 r = 1 t = 1 t = 0 otherwise match ttr0* otherwise otherwise match ttr1* otherwise ttr1*[w] = 1 and (write or
rmw access indicated in
stack frame) ttr0*[w] = 1 and (write or
rmw access indicated in
stack frame) write or rmw access
indicated in stack
frame w = 1 otherwise w = 0 otherwise s = 1 and (user access
indicated in stack frame) not mmu * refers to either instruction or data transparent translation register. figure 3-23. mmu status interpretation f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 4- 1 section 4 instruction and data caches note ignore all references to the memory management unit (mmu) when reading for the mc68ec040 and mc68ec040v. the functionality of the mc68040 transparent translation registers has been changed in the mc68ec040 and mc68ec040v to the access control registers. refer to appendix b mc68ec040 for details. the m68040 contains two independent, 4-kbyte, on-chip caches located in the physical address space. accessing instruction words and data simultaneously through separate caches increases instruction throughput. the m68040 caches improve system performance by providing cached data to the on-chip execution unit with very low latency. systems with an alternate bus master receive increased bus availability. figure 4-1 illustrates the instruction and data caches contained in the instruction and data memory units. the appropriate memory unit independently services instruction prefetch and data requests from the integer unit (iu). the memory units translate the logical address in parallel with indexing into the cache. if the translated address matches one of the cache entries, the access hits in the cache. for a read operation, the memory unit supplies the data to the iu, and for a write operation, the memory unit updates the cache. if the access does not match one of the cache entries (misses in the cache) or a write access must be written through to memory, the memory unit sends an external bus request to the bus controller. the bus controller then reads or writes the required data. cache coherency in the m68040 is optimized for multimaster applications in which the m68040 is the caching master sharing memory with one or more noncaching masters (such as dma controllers). the m68040 implements a bus snooper that maintains cache coherency by monitoring an alternate bus master? access and performing cache maintenance operations as requested by the alternate bus master. matching cache entries can be invalidated during the alternate bus master? access to memory, or memory can be inhibited to allow the m68040 to respond to the access as a slave. for an external write operation, the processor can intervene in the access and update its internal caches (sink data). for an external read operation, the processor supplies cached data to the alternate bus muster (source data). this prevents the m68040 caches from accumulating old or invalid copies of data (stale data). alternate bus masters are allowed access to locally modified data within the caches that is no longer consistent with external memory ( dirty data). allowing memory pages to be specified as write-through instead of copyback also supports cache coherency. when a processor writes to write-through pages, external f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
4- 2 m68040 user's manual motorola memory is always updated through an external bus access after updating the cache, keeping memory and cached data consistent. instruction
fetch decode ea
calculate execute writeback ea
fetch integer
unit convert execute write-
back instruction
atc instruction
mmu/cache/snoop
controller bus
control
signals data
bus address
bus data
atc data
mmu/cache/snoop
controller operand data bus instruction data bus instruction
cache data
cache floating-
point unit data memory unit instruction memory unit b
u
s
c
o
n
t
r
o
l
l
e
r instruction
address data
address figure 4-1. overview of internal caches 4.1 cache operation both four-way set-associative caches have 64 sets of four 16-byte lines. there are two formats that define each cache line, an instruction cache line format and a data cache line format. each format contains an address tag consisting of the upper 22 bits of the physical address, status information, and four long words (128 bits) of data. the status information for the instruction cache line address tag consists of a single valid bit for the entire line. the status information for the data cache line address tag contains a valid bit and four additional bits to indicate dirty status for each long word in the line. note that only the data cache supports dirty cache lines. figure 4-2 illustrates the instruction cache line format (a) and the data cache line format (b). f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 4- 3 tag v lw3 lw2 lw1 lw0 (a) instruction cache line tag v lw3 d3 lw2 d2 lw1 d1 lw0 d0 tag 22-bit physical address tag v line valid bit lw long word n (32-bit) data entry dn dirty bit for long word n (b) data cache line figure 4-2. cache line formats the cache stores an entire line, providing validity on a line-by-line basis. only burst mode accesses that successfully read four long words can be cached. memory devices unable to support bursting can respond to a cache line read or write access by asserting the transfer burst inhibit ( tbi ) signal, forcing the processor to complete the access as a sequence of three long-word accesses. the cache recognizes burst accesses as if the access were never inhibited, detecting no difference. a cache line is always in one of three states: invalid, valid, or dirty. for invalid lines, the v- bit is clear, causing the cache line to be ignored during lookups. valid lines have their v-bit set and d-bits cleared, indicating all four long words in the line contain valid data consistent with memory. dirty cache lines have the v-bit and one or more d-bits set, indicating that the line has valid long-word entries that have not been written to memory (long words whose d-bit is set). a cache line changes from valid to invalid if the execution of the cinv or cpush instruction explicitly invalidates the cache line; if a snooped write access hits the cache line and the line is not dirty; or if the scx signals for a snooped read access invalidates the line. both caches should be explicitly cleared after a hardware reset of the processor since reset does not invalidate the cache lines. figure 4-3 illustrates the general flow of a caching operation. the corresponding memory unit translates the logical address of each access to a physical address allowing the iu to access the data in the cache. to minimize latency of the requested data, the lower untranslated bits of the logical address map directly to the physical address bits and are used to access a set of cache lines in parallel with the translation. physical address bits 9? are used to index into the cache and select one of the 64 sets of four cache lines. the four tags from the selected cache set are compared with the translated physical address bits 31?2 and bits 11 and 10 of the untranslated page offset. if any one of the four tags matches and the tag status is either valid or dirty, then the cache has a hit. during read accesses, a half-line (two long words) is accessed at a time, requiring two cache accesses for reads that are greater than a half-line or two long words. write accesses within a cache line require a single cache access. if a misaligned access crosses two pages, then the partial access to the first page always happens twice, even if the pages are serialized. consequently, if the accesses span page boundaries, misaligned accesses to peripherals are not possible unless the peripheral can tolerate double reads or writes. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
4- 4 m68040 user's manual motorola s page frame page offset 31 12 0 address
translation
cache pa11?a10 0 comparator 1 3 2 hit 3 hit 2 hit 1 hit 0 hit tag status tag status set 0 set 1 set 63 line 0 line 1 line 2 line 3 d0 d1 d2 d3 d0 d1 d2 d3 mux logical or line select data or
instruction physical
set select
pa9?a4 supervisor
bit la31?a12 logical address pa31?a12 translated
physical
address
pa31?a10 figure 4-3. caching operation both caches contain circuitry to automatically determine which cache line in a set to use for a new line. the cache controller locates the first invalid line and uses it; if no invalid lines exist, then a pseudo-random replacement algorithm is used to select a valid line, replacing it with the new line. each cache contains a 2-bit counter, which is incremented for each access to the cache. the instruction cache counter is incremented for each half- line accessed in the instruction cache. the data cache counter is incremented for each half-line accessed during reads, for each full line accessed during writes in copyback mode, and for each bus transfer resulting from a write in write-through mode. when a miss occurs and all four lines in the set are valid, the line pointed to by the current counter value is replaced, after which the counter is incremented. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 4- 5 4.2 cache management using the movec ins truction, the caches are individually enabled to access the 32-bit cache control register (cacr) illustrated in figure 4-4. the cacr contains two enable bits that allow the instruction and data caches to be independently enabled or disabled. setting one of these bits enables the associated cache without affecting the state of any lines within the cache. a hardware reset clears the cacr, disabling both caches; however, reset does not affect the tags, state information, and data within the caches. the cinv instruction must clear the caches before enabling them. it is not recommended that page descriptors be cached. specifically, the m68040 does not support the caching of page descriptors in copyback mode with the bit pattern u = 0, m = 1, and r = 1 in a page descriptor. the m68040 table search algorithm will never leave this bit pattern for a page descriptor. 31 30 16 15 14 0 de undefined ie undefined de = enable data cache ie = enable instruction cache figure 4-4. cache control register system hardware can assert the cache disable ( cdis ) signal to dynamically disable both caches, regardless of the state of the enable bits in the cacr. the caches are disabled immediately after the current access completes. if cdis is asserted during the access for the first half of a misaligned operand spanning two cache lines, the data cache is disabled for the second half of the operand. accesses by the execution units bypass the caches while they are disabled and do not affect their contents (with the exception of cinv and cpush instructions). disabling the caches with cdis does not affect snoop operations. cdis is intended primarily for use by in-circuit emulators to allow swapping between the tags and emulator memories. even if the instruction cache is disabled, the m68040 can cache instructions because of an internal cache line register. this happens for instruction loops that are completely resident within the first six bytes of a half-line. thus, the cache line holding register can operate as a small cache. if a loop fits anywhere within the first three words of a half-line, then it becomes cached. the cinv and cpush instructions support cache management in the supervisor mode. cinv allows selective invalidation of cache entries. cpush performs two operations: 1) any selected data cache lines containing dirty data are pushed to memory; 2) all selected cache lines are invalidated. this operation can be used to update a page in memory before swapping it out with snooping disabled or to push dirty data when changing a page caching mode to write-through. because of the size of the caches, pushing pages or an entire cache incurs a significant time penalty. however, these instructions are interruptable to avoid large interrupt latencies. the state of the cdis signal or the cache enable bits in the cacr does not affect the operation of cinv and cpush. both instructions allow operation on a single cache line, all cache lines in a specific page, or an f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
4- 6 m68040 user's manual motorola entire cache, and can select one or both caches for the operation. for line and page operations, a physical address in an address register specifies the memory address. 4.3 caching modes every iu access to the cache has an associated caching mode that determines how the cache handles the access. an access can be cachable in either the write-through or copyback modes, or it can be cache inhibited in nonserialized or serialized modes. the cm field corresponding to the logical address of the access normally specifies, on a page- by-page basis, one of these caching modes. the default memory access caching mode is nonserialized. when the cache is enabled and memory management is disabled, the default caching mode is write-through. the transparent translation registers and mmus allow the defaults to be overridden. in addition, some instructions and iu operations perform data accesses that have an implicit caching mode associated with them. the following paragraphs discuss the different caching accesses and their related cache modes. 4.3.1 cachable accesses if a page descriptor? cm field indicates write-through or copyback, then the access is cachable. a read access to a write-through or copyback page is read from the cache if matching data is found. otherwise, the data is read from memory and used to update the cache. since instruction cache accesses are always reads, the selection of write-through or copyback modes do not affected them. the following paragraphs describe the write- through and copyback modes in detail. 4.3.1.1 write-through mode. accesses to pages specified as write-through are always written to the external address, although the cycle can be buffered, keeping memory and cache data consistent. writes in write-through mode are handled with a no- write-allocate policy?.e., writes that miss in a data cache are written to memory but do not cause the corresponding line in memory to be loaded into the cache. write accesses always write through to memory and update matching cache lines. specifying write- through mode for the shared pages maintains cache coherency for shared memory areas in a multiprocessing environment. the cache supplies data to instruction or data read accesses that hit in the appropriate cache; misses cause a new cache line to be loaded into the cache, replacing a valid cache line if there are no invalid lines. 4.3.1.2 copyback mode. copyback pages are typically used for local data structures or stacks to minimize external bus usage and reduce write access latency. write accesses to pages specified as copyback that hit in the data cache update the cache line and set the corresponding d-bits without an external bus access. the dirty cached data is only written to memory if 1) the line is replaced due to a miss, 2) a cache inhibited access matches the line, or 3) the cpush instruction explicitly pushes the line. if a write access misses in the cache, the memory unit reads the needed cache line from memory and updates the cache. when a miss causes a dirty cache line to be selected for replacement, the memory unit places the line in an internal copyback buffer. the replacement line is read into the cache, and writing the dirty cache line back to memory updates memory. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 4- 7 4.3.2 cache-inhibited accesses address space regions containing targets such as i/o devices and shared data structures in multiprocessing systems can be designated cache inhibited. if a page descriptor? cm field indicates nonserialized or serialized, then the access is cache inhibited. the caching operation is identical for both cache-inhibited modes. if the cm field of a matching address indicates either nonserialized or serialized modes, the cache controller bypasses the cache and performs an external bus transfer. the data associated with the access is not cached internally, and the cache inhibited out ( ciout ) signal is asserted during the bus transfer to indicate to external memory that the access should not be cached. if the data cache line is already resident in an internal cache, then the data cache line is pushed from the cache if it is dirty or the data cache line is invalidated if it is valid. if the cm field indicates serialized, then the sequence of read and write accesses to the page is guaranteed to match the sequence of the instruction order. without serialization, the iu pipeline allows read accesses to occur before completion of a write-back for a previous instruction. serialization forces operand read accesses for an instruction to occur only once by preventing the instruction from being interrupted after the operand fetch stage. otherwise, the instruction is aborted, and the operand is accessed when the instruction is restarted. these guarantees apply only when the cm field indicates the serialized mode and the accesses are aligned. regardless of the selected cache mode, locked accesses are implicitly serialized. the tas, cas, and cas2 instructions use locked accesses for operands in memory and for updating translation table entries during table search operations. 4.3.3 special accesses several other processor operations result in accesses that have special caching characteristics besides those with an implied cache-inhibited access in the serialized mode. exception stack accesses, exception vector fetches, and table searches that miss in the cache do not allocate cache lines in the data cache, preventing replacement of a cache line. cache hits by these accesses are handled in the normal manner according to the caching mode specified for the accessed address. accesses by the move16 instruction also do not allocate cache lines in the data cache for either read or write misses. read hits on either valid or dirty cache lines are read from the cache. write hits invalidate a matching line and perform an external access. interacting with the cache in this manner prevents a large block move or block initialization implemented with a move16 from being cached, since the data may not be needed immediately. if the data cache is re-enabled after a locked access has hit and the data cache was disabled, the next non-locked access that results in a data cache miss will not be cached. 4.4 cache protocol the cache protocol for processor and snooped accesses is described in the following paragraphs. in all cases, an external bus transfer will cause a cache line state to change f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
4- 8 m68040 user's manual motorola only if the bus transfer is marked as snoopable on the bus. the protocols described in the following paragraphs assume that the data is cachable (i.e., write-through and copyback). 4.4.1 read miss a processor read that misses in the cache causes the cache controller to request a bus transaction that reads the needed line from memory and supplies the required data to the iu. the line is placed in the cache in the valid state. snooped external reads that miss in the cache have no affect on the cache. 4.4.2 write miss the cache controller handles processor writes that miss in the cache differently for write- through and copyback pages. write misses to copyback pages cause the processor to perform a bus transaction that writes the needed cache line into its cache from memory in the same manner as for a read miss. the new cache line is then updated with the write data, and the d-bits are set for each long word that has been modified, leaving the cache line in the dirty state. write misses to write-through pages write directly to memory without loading the corresponding cache line in the cache. snooped external writes that miss in the cache have no affect on the cache. 4.4.3 read hit the cache controller handles processor reads that hit in the cache differently for write- through and copyback pages. no bus transaction is performed, and the state of the cache line does not change. physical address bit 3 selects either the upper or lower half-line containing the required operand. this half-line is driven onto the internal bus. if the required data is allocated entirely within the half-line, only one access into the cache is required. because the organization of the cache does not allow selection of more than one half-line at a time, misalignment across a half-line boundary requires two accesses into the cache. a snooped external read that hits in the cache is ignored if the cache line is valid. if the snooped access hits a dirty line, memory is inhibited from responding, and the data is sourced from the cache directly to the alternate bus master. a snooped read hit does not change the state of the cache line unless the snooped access also indicates mark invalid, which causes the line to be invalidated after the access, even if it is dirty. alternate bus masters should indicate mark invalid only for line reads to ensure the entire line is transferred before invalidating. 4.4.4 write hit the cache controller handles processor writes that hit in the cache differently for write- through and copyback pages. for write-through accesses, a processor write hit causes the cache controller to update the affected long-word entries in the cache line and to request an external memory write transfer to update memory. the cache line state does not change. a write-through access to a line containing dirty data constitutes a system programming error even if the d-bits for the line are unchanged. this situation can be f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 4- 9 avoided by pushing cache lines when a page descriptor is changed and ensuring that alternate bus masters indicate the appropriate snoop operation for writes to corresponding pages (i.e., mark invalid for write-through pages and sink data for copyback pages). if the access is copyback, the cache controller updates the cache line and sets the d-bit for of the appropriate long words in the cache line. an external write is not performed, and the cache line state changes to, or remains in, the dirty state. an alternate bus master can drive the scx signals for a write access with an encoding that indicates to the m68040 that it should sink the data, inhibit memory, and respond as a slave if the access hits in the cache. the cache operation depends on the access size and current line state. a snooped line write that hits a valid line always causes the corresponding cache line to be invalidated. for snooped writes of byte, word, or long-word size that hit a dirty line, the processor inhibits memory and responds to the alternate bus master as a slave, sinking the data. data received from the alternate bus master is written to the appropriate long word in the cache line, and the d-bit is set for that entry. the cache controller invalidates a cache line if the snoop control pins have indicated that a matching cache line is marked invalid for a snoop write. 4.5 cache coherency the m68040 provides several different mechanisms to assist in maintaining cache coherency in multimaster systems. both write-through and copyback memory update techniques are supported to maintain coherency between the data cache and memory. alternate bus master accesses can reference data that the m68040 caches, causing coherency problems if the accesses are not handled properly. the m68040 snoops the bus during alternate bus master transfers. if a write access hits in the cache, the m68040 can update its internal caches, or if a read access hits, it can intervene in the access to supply dirty data. caches can be snooped even if they are disabled. the alternate bus master controls snooping through the snoop control signals, indicating which access can be snooped and the required operation for snoop hits. table 4-1 lists the requested snoop operation for each encoding of the snoop control signals. since the processor and the bus snooper must both access the caches, the snoop controller has priority over the processor for snoopable accesses to maintain cache coherency. table 4-1. snoop control encoding requested snoop operation sc1 sc0 alternate bus master read access alternate bus master write access 0 0 inhibit snooping inhibit snooping 0 1 supply dirty data and leave dirty data sink byte/word/long/long word 1 0 supply dirty data and mark line invalid invalidate line 1 1 reserved (snoop inhibited) reserved (snoop inhibited) the snooping protocol and caching mechanism supported by the m68040 are optimized to support multimaster systems with the m68040 as the single caching master. in systems f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
4- 10 m68040 user's manual motorola implementing multiple mc68040s as bus masters, shared data should be stored in write- through pages. this procedure allows each processor to cache shared data for read access while forcing a processor write to shared data to appear as an external write to memory, which the other processors can snoop. if shared data is stored in copyback pages, only one processor at a time can cache the data since writes to copyback pages do not access the external bus. if a processor accesses shared data cached by another processor, the slave can source the data to the master without invalidating its own copy only if the transfer to the master is cache inhibited. for the master processor to cache the data, it must force invalidation of the slave processor? copy of the data (by specifying mark invalid for the snoop operation), and the memory controller must monitor the data transfer between the processors and update memory with the transferred data. the memory update is required since the master processor is unaware of the sourced data (valid data from memory or dirty data from a snooping processor) and initially creates a valid cache line, losing dirty status if a snooping processor supplies the data. coherency between the instruction cache and the data cache must be maintained in software since the instruction cache does not monitor data accesses. processor writes that modify code segments (i.e., resulting from self-modifying code or from code executed to load a new page from disk) access memory through the data memory unit. because the instruction cache does not monitor these data accesses, stale data occurs in the instruction cache if the corresponding data in memory is modified. invalidating instruction cache lines before writing to the corresponding memory lines can prevent this coherency problem, but only if the data cache line is in write-through mode and the page is marked serialized. a cache coherency problem could arise if the data cache line is configured as copyback and no serialization is done. to fully support self-modifying code in any situation, it is imperative that a cpusha instruction be executed before the execution of the first self-modified instruction. the cpusha instruction has the effect of ensuring that there is no stale data in memory, the pipeline is flushed, and instruction prefetches are repeated and taken from external memory. another potential coherency problem exists due to the relationship between the cache state information and the translation table descriptors. because each cache line reflects page state information, a page should be flushed from the cache before any of the page attributes are changed. the presence of a valid or dirty cache line implicitly indicates that accesses to the page containing the line are cachable. the presence of a dirty cache line implies that the page is not write protected and that writes to the page are in copyback mode. a system programming error occurs when page attributes are changed without flushing the corresponding page from the cache, resulting in cache line states inconsistent with their page definitions. even with these inconsistencies, the cache is defined and predictable. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 4- 11 4.6 memory accesses for cache maintenance the cache controller in each memory unit performs all maintenance activities that supply data from the cache to the execution units. the activities include requesting accesses to the bus interface unit for reading new cache lines and writing dirty cache lines to memory. the following paragraphs describe the memory accesses resulting from cache fill operations (by both caches) and push operations (by the data cache). refer to section 7 bus operation for detailed information about the bus cycles required. 4.6.1 cache filling when a new cache line is required, the cache controller requests a line read from the bus controller. the bus controller requests a burst read transfer by indicating a line access with the size signals (siz1, siz0) and indicates which line in the set is being loaded with the transfer line number signals (tln1, tln0). tln1 and tln0 are undefined for the instruction cache. these pins indicate the appropriate line numbers for data cache transfers only. table 4-2 lists the definition of the tlnx encoding. table 4-2. tlnx encoding tln1 tln0 line 0 0 zero 0 1 one 1 0 two 1 1 three the responding device sequentially supplies four long words of data and can assert the transfer cache inhibit signal ( tci ) if the line is not cachable. if the responding device does not support the burst mode, it should assert the tbi signal for the first long word of the line access. the bus controller responds by terminating the line access and completes the remainder of the line read as three, sequential, long-word reads. bus controller line accesses implicitly request burst mode operations from external memory. to operate in the burst mode, the device or external hardware must be able to increment the low-order address bits as described in section 7 bus operation . the device indicates its ability to support the burst access by acknowledging the initial long- word transfer with transfer acknowledge ( ta ) asserted and tbi negated. this procedure causes the processor to continue to drive the address and bus control signals and to latch a new data value for the cache line at the completion of each subsequent cycle (as defined by ta ) for a total of four cycles. the bursting mechanism requires addresses to wrap around so that the entire four long words in the cache line are filled in a single operation. when a cache line read is initiated, the first cycle attempts to load the line entry corresponding to the instruction half-line or data item requested by the iu. subsequent transfers are for the remaining entries in the cache line. in the case of a misaligned f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
4- 12 m68040 user's manual motorola access in which the operand spans two line entries, the first cycle corresponds to the line entry containing the portion of the operand at the lower address. the cache controller temporarily stores the data from each cycle in a line read buffer, where it is immediately available to the iu. if a misaligned access spans two entries in the line, the second portion of the operand is available to the iu as soon as the second memory cycle completes. a new iu access that hits the cache line being filled is also supplied data as soon as the required long word has been received from the bus controller. during the period required to fill the buffer, other iu accesses that hit in the cache are supplied data. this is vertical for a short cache-inhibited code loop that is less than eight bytes in length. subsequent interactions of the loop hit in the buffer, but appear to hit in the cache since there is no external bus activity associated with the reads. the assertion of tci during the first cycle of a burst read operation inhibits loading of the buffered line into the cache, but it does not cause the burst transfer (or pseudo-burst transfer if tbi is asserted with tci ) to be terminated early. the data placed in the buffer is accessible by the iu until the last long word of the burst is transferred from the bus controller, after which the contents of the buffer are invalidated without being copied into the cache. the assertion of tci is ignored during the second, third, or fourth cycle of a burst operation and is ignored for write operations. a bus error occurring during a burst operation causes the burst operation to abort. if the bus error occurs during the first cycle of a burst, the data from the bus is ignored. if the access is a data cycle, exception processing proceeds immediately. if the cycle is for an instruction prefetch, a bus error exception is pending. the bus error is processed only if the iu attempts to use either instruction word. refer to section 7 bus operation for more information about pipeline operation. for either cache, when a bus error occurs on the second cycle or later, the burst operation is aborted and the line buffer is invalidated. the processor may or may not take an exception, depending on the status of the pending data request. if the bus error cycle contains a portion of a data operand that the processor is specifically waiting for (e.g., the second half of a misaligned operand), the processor immediately takes an exception. otherwise, no exception occurs, and the cache line fill is repeated the next time data within the line is required. in the case of an instruction cache line fill, the data from the aborted cycle is completely ignored. on the initial access of a line read, a retry (indicated by the assertion of ta and tea ) causes the bus controller to retry the bus cycle. however, a retry signaled during the remaining cycles of the line access (either burst or pseudo-burst) is recognized as a bus error, and the processor handles it as described in the previous paragraphs. a cache inhibit or bus error on a line read can change the state of the line being replaced, even though the new line is not copied into the cache. before loading a new line, the cache line being replaced is copied to the push buffer; if it is dirty, the cache line is invalidated. if a cache inhibit or bus error occurs on a replacement line read, a dirty line is restored to the cache from the push buffer. however, the line being replaced is not restored in the cache if it was originally valid and the cache line remains invalid. if the line f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 4- 13 read resulting from a write miss in copyback mode is cache inhibited, the write access misses in the cache and writes through to memory. 4.6.2 cache pushes when the cache controller selects a dirty data cache line for replacement, memory must be updated with the dirty data before the line is replaced. this occurs when a cpush instruction execution explicitly selects the cache and when a cache inhibit access hits in the cache. to reduce the requested data? latency in the new line, the dirty line being replaced is temporarily placed in a push buffer while the new line is fetched from memory. when a line is allocated to the push buffer, an alternate bus master can snoop it, but the execution units cannot access it. after the bus transfer for the new line successfully completes, the dirty cache line is copied back to memory, and the push buffer is invalidated. if the operation to access the replacement line is abnormally terminated or signaled as cache inhibited, the line in the push buffer is copied back into its original position in the cache, and the processor continues operation as described in the previous paragraphs. the number of dirty long words in the line to be pushed determines the size of the push transfer on the bus, minimizing bus bandwidth required for the push. a single long word is written to memory using a long-word push transfer if it is dirty. a push transfer is distinguished from a normal write transfer by an encoding of 000 on the transfer modifier signals (tm2?m0) for the push. asserting ta and tea retries the transfer; a bus-error - asserted tea terminates it . if a bus error terminates a push transfer, the processor immediately takes an exception. a line containing two or more dirty long words is copied back to memory, using a line push transfer. for a line push, the bus controller requests a burst write transfer by indicating a line access with siz1 and siz0. the responding device sequentially accepts four long words of data. if the responding device does not support the burst mode, it should assert tbi for the first long word of the line access. the bus controller responds by terminating the line access and completes the remainder of the line push as three, sequential, long- word writes. the first cycle of the burst can be retried, but the bus controller interprets a retry for any of the three remaining cycles as a bus error. if a bus error occurs in any cycle in the line push transfer, the processor immediately takes an exception. a dirty cache line hit by a cache-inhibited access is pushed before the external bus access occurs. if the access is part of a locked transfer sequence for tas, cas, or cas2 operand accesses or translation table updates, the lock signal is also asserted for the push access. 4.7 cache operation summary the instruction and data caches function independently when servicing access requests from the iu. the following paragraphs discuss the operational details for the caches and present state diagrams depicting the cache line state transitions. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
4- 14 m68040 user's manual motorola 4.7.1 instruction cache the iu uses the instruction cache to store instruction prefetches as it requests them. instruction prefetches are normally requested from sequential memory locations except when a change of program flow occurs (e.g., a branch taken) or when an instruction that can modify the status register (sr) is executed, in which case the instruction pipe is automatically flushed and refilled. the instruction cache supports a line-based protocol that allows individual cache lines to be in either the invalid or valid states. for instruction prefetch requests that hit in the cache, the half-line selected by physical address bit 3 is multiplexed onto the internal instruction data bus. when an access misses in the cache, the cache controller requests the line containing the required data from memory and places it in the cache. if available, an invalid line is selected and updated with the tag and data from memory. the line state then changes from invalid to valid by setting the v-bit. if all lines in the set are already valid, a pseudo-random replacement algorithm is used to select one of the four cache lines replacing the tag and data contents of the line with the new line information. figure 4-5 illustrates the instruction-cache line state transitions resulting from processor and snoop controller accesses. transitions are labeled with a capital letter, indicating the previous state, followed by a number indicating the specific case listed in table 4-3. invalid valid i1-cpu read miss i3?inv/cpush v1?pu read miss
v2?pu read hit v3?inv/cpush
v5?noop read hit
v6?noop write hit figure 4-5. instruction-cache line state diagram f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 4- 15 table 4-3. instruction-cache line state transitions current state cache operation invalid cases valid cases cpu read miss i1 read line from memory; supply data to cpu and update cache; go to valid state. v1 read line from memory; supply data to cpu and update cache (replacing old line); remain in current state. cpu read hit i2 not possible v2 supply data to cpu; remain in current state. cache invalidate or push (cinv or cpush) i3 no action; remain in current state. v3 no action; go to invalid state. alternate master read hit (snoop control = 01 ?leave dirty) i4 not possible; not snooped. v4 not possible; not snooped. alternate master read hit (snoop control = 10 ?invalidate) i5 not possible v5 no action; go to invalid state. alternate master write hit (snoop control = 01 ?leave dirty or snoop control = 10 ?invalidate) i6 not possible v6 no action; go to invalid state. 4.7.2 data cache the iu uses the data cache to store operand data as it generates the data. the data cache supports a line-based protocol allowing individual cache lines to be in one of three states: invalid, valid, or dirty. to maintain coherency with memory, the data cache supports both write-through and copyback modes, specified by the cm field for the page. read misses and write misses to copyback pages cause the cache controller to read a new cache line from memory into the cache. if available, an invalid line in the selected set is updated with the tag and data from memory. the line state then changes from invalid to valid by setting the v-bit for the line. if all lines in the set are already valid or dirty, the pseudo-random replacement algorithm is used to select one of the four lines and replace the tag and data contents of the line with the new line information. before replacement, dirty lines are temporarily buffered and later copied back to memory after the new line has been read from memory. if a snoop access occurs before the buffered line is written to memory, the snoop controller snoops the buffer and the caches. figure 4-6 illustrates the three possible states for a data cache line, with the possible transitions caused by either the processor or snooped accesses. transitions are labeled with a capital letter, indicating the previous state, followed by a number indicating the specific case listed in table 4-4. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
4- 16 m68040 user's manual motorola valid dirty invalid i4?pu write miss/wt
i7?inv
i8?push i3?pu write miss/cb v1?pu read miss
v2?pu read hit
v4?pu write miss/wt
v6?pu write hit/wt
v9?noop read hit/leave dirty abbreviations: wt?rite-through mode cb?opyback mode snoop operation indicates: read or write / snoop control encoding i1?pu read miss v7?inv
v8?push
v10?noop read hit/invalidate
v11?noop write hit/invalidate
v12?noop write hit/sink data &
size = line
v13?noop write hit/sink data &
size = line d2?pu read hit
d3?pu write miss/cb
d4?pu write miss/wt
d5?pu write hit/cb
d6?pu write hit/wt
d9?noop read hit/leave dirty
d12?noop write hit/sink data
& size = line d7?inv
d8?push
d10?noop read
hit/invalidate
d11?noop write hit/
invalidate
d13?noop write hit/sink
data & size = line v3?pu write miss/cb
v5?pu write hit/cb d1?pu read miss figure 4-6. data-cache line state diagram f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 4- 17 table 4-4. data-cache line state transitions current state cache operation invalid cases valid cases dirty cases cpu read miss i1 read line from memory; supply data to cpu and update cache; go to valid state. v1 read line from memory; supply data to cpu and update cache (replacing old line); remain in current state. d1 buffer dirty cache line; read new line from memory; supply data to cpu and update cache; write buffered dirty data to memory; go to valid state. cpu read hit i2 not possible v2 supply data to cpu; remain in current state. d2 supply data to cpu; remain in current state. cpu write miss (copyback) i3 read line from memory into cache; write data to cache; set dn bits of modified long words; go to dirty state. v3 read line from memory into cache (replacing old line); write data to cache and set dn bits; go to dirty state. d3 buffer dirty cache line; read new line from memory; write data to cache and set dn bits; write buffered dirty data to memory; remain in current state. cpu write miss (write-through) i4 write data to memory; remain in current state. v4 write data to memory; remain in current state. d4 write data to memory; remain in current state (see note). cpu write hit (copyback) i5 not possible v5 write data into cache; set dn bits of modified long words; go to dirty state. d5 write data in cache; set dn bits of modified long words; remain in current state. cpu write hit (write-through) i6 not possible v6 write data to cache; write data to memory; remain in current state. d6 write data into cache (no change to dn bits); write data to memory; remain in current state (see note). cache invalidate (cinv) i7 no action; remain in current state. v7 no action; go to invalid state. d7 no action (dirty data lost); go to invalid state. cache push (cpush) i8 no action; remain in current state. v8 no action; go to invalid state. d8 write dirty data to memory; go to invalid state. alternate master read hit (snoop control = 01 ?leave dirty) i9 not possible v9 no action; remain in current state. d9 inhibit memory and source data; remain in current state. note: dirty state transitions d4 and d6 are the result of a system programming error and should be avoided even though they are technically valid. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
4- 18 m68040 user's manual motorola table 4-4. data-cache line state transitions (continued) current state cache operation invalid cases valid cases dirty cases alternate master read hit (snoop control = 10 ?invalidate) i10 not possible v10 no action; go to invalid state. d10 inhibit memory and source data; go to invalid state alternate master write hit (snoop control = 10 ?nvalidate) i11 not possible v11 no action; go to invalid state. d11 no action; go to invalid state. alternate master write hit (snoop control = 01 ?sink data and size 1 line) i12 not possible v12 no action; go to invalid state. d12 inhibit memory and sink data; set dn bits of modified long words; remain in current state. alternate master write hit (snoop control = 01 ?sink data and size = line) i13 not possible v13 no action; go to invalid state. d13 no action; go to invalid state. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 5- 1 section 5 signal description this section contains brief descriptions of the input and output signals in their functional groups (see figure 5-1). each signal? function is briefly explained, referencing other sections that contain detailed information about the signal and related operations. table 5-1 lists the signal names, mnemonics, and functional descriptions of the input and output signals for the m68040. timing specifications for these signals can be found in section 11 mc68040 electrical and thermal characteristics . notes assertion and negation are used to specify forcing a signal to a particular state. assertion and assert refer to a signal that is active or true. negation and negate refer to a signal that is inactive or false. these terms are used independent of the voltage level (high or low) that they represent. for the mc68040v, mc68lc040, mc68ec040, and mc68ec040v ignore all references to the floating-point unit (fpu). for the mc68ec040 and mc68ec040v only, ignore all references to the memory management unit (mmu). some pin names are different on these parts; please refer to the appropriate appendix in the back of this book for more information. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
5- 2 m68040 user? manual motorola table 5-1. signal index signal name mnemonic function address bus a31?0 32-bit address bus used to address any of 4-gbytes. data bus d31?0 32-bit data bus used to transfer up to 32 bits of data per bus transfer. transfer type tt1,tt0 indicates the general transfer type: normal, move16, alternate logical function code, and acknowledge. transfer modifier tm2?m0 indicates supplemental information about the access. transfer line number tln1,tln0 indicates which cache line in a set is being pushed or loaded by the current line transfer. user-programmable attributes upa1,upa0 user-defined signals, controlled by the corresponding user attribute bits from the address translation entry. read/write r/ w identifies the transfer as a read or write. transfer size siz1,siz0 indicates the data transfer size. these signals, together with a0 and a1, define the active sections of the data bus. bus lock lock indicates a bus transfer is part of a read-modify-write operation, and the sequence of transfers should not be interrupted. bus lock end locke indicates the current transfer is the last in a locked sequence of transfers. cache inhibit out ciout indicates the processor will not cache the current bus transfer. transfer start ts indicates the beginning of a bus transfer. transfer in progress tip asserted for the duration of a bus transfer. transfer acknowledge ta asserted to acknowledge a bus transfer. transfer error acknowledge tea indicates an error condition exists for a bus transfer. transfer cache inhibit tci indicates the current bus transfer should not be cached. transfer burst inhibit tbi indicates the slave cannot handle a line burst access. data latch enable 1 dle alternate clock input used to latch input data when the processor is operating in dle mode. snoop control sc1,sc0 indicates the snooping operation required during an alternate master access. memory inhibit mi inhibits memory devices from responding to an alternate master access during snooping operations. bus request br asserted by the processor to request bus mastership. bus grant bg asserted by an arbiter to grant bus mastership to the processor. bus busy bb asserted by the current bus master to indicate it has assumed ownership of the bus. cache disable cdis dynamically disables the internal caches to assist emulator support. mmu disable 2 mdis disables the translation mechanism of the mmus. reset in rsti processor reset. reset out rsto asserted during execution of a reset instruction to reset external devices. interrupt priority level 3 ipl2eipl0 provides an encoded interrupt level to the processor. interrupt pending ipend indicates an interrupt is pending. autovector avec used during an interrupt acknowledge transfer to request internal generation of the vector number. processor status pst3?st0 indicates internal processor status. bus clock bclk clock input used to derive all bus signal timing. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 5- 3 table 5-1. signal index (continued) signal name mnemonic function processor clock pclk 4 clock input used for internal logic timing. the pclk frequency is exactly 2 the bclk frequency. test clock tck clock signal for the ieee p1149.1 test access port (tap). test mode select tms selects the principle operations of the test-support circuitry. test data input tdi serial data input for the tap. test data output tdo serial data output for the tap. test reset trst 4 provides an asynchronous reset of the tap controller. power supply v cc power supply. ground gnd ground connection. notes: 1. this signal is only available on the mc68040. 2. this signal is not available on the mc68ec040 and the mc68ec040v. 3. these signals are different on power-up for the mc68lc040 and mc68ec040. 4. these signals are not available on the mc68040v and mc68ec040v. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
5- 4 m68040 user? manual motorola mc68040 v cc gnd bus arbitration bg br bb bus snoop control
and response m i interrupt
control ipl0 3 avec ipend processor
control cdis rsti rsto pclk 4 bclk test trst 4 tms tck tdi power supply tdo sc0 sc1 ipl1 3 ipl2 3 status and
clocks pst0 pst1 pst2 data bus d31?0 transfer
attributes master
transfer
control a31?0 address
bus ts tip tci slave
transfer
control tea tbi r/w locke ciout tt0 tt1 tm0 tm1 tm2 tln0 tln1 upa0 upa1 siz0 siz1 lock ta dle 1 mdis 2 1. this signal is only available on the mc68040.
2. this signal is not available on the mc68ec040 and mc68ec040v.
3. these signals are different on power-up for the mc68lc040 and mc68ec040.
4. these signals are not available on the mc68040v and mc68ec040v. notes: pst3 figure 5-1. functional signal groups 5.1 address bus (a31?0) these three-state bidirectional signals provide the address of the first item of a bus transfer (except for acknowledge transfers) when the m68040 is the bus master. when an alternate bus master is controlling the bus, the processor examines (snoops) these signals to determine whether the processor should intervene in the access to maintain cache coherency. the level on cdis can select a multiplexed bus mode during processor reset, which allows the address bus and data bus to be physically tied together for multiplexed bus f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 5- 5 applications. refer to section 7 bus operation for detailed information about the relationship of the address bus to bus operation and the multiplexed bus mode. refer to appendix a mc68lc040 and appendix b mc68ec040 for details concerning the cdis level and multiplexed bus mode. 5.2 data bus (d31?0) these three-state bidirectional signals provide the general-purpose data path between the m68040 and all other devices. the data bus can transfer 8, 16, or 32 bits of data per bus transfer. during a burst transfer, the data lines are time-multiplexed to carry all 128 bits of the burst request using four 32-bit transfers. the level on cdis can select a multiplexed bus mode during processor reset, which allows the data bus and address bus to be physically tied together for multiplexed bus applications. the level on mdis can select a data latch mode during processor reset, which allows the memory interface to specify when the processor should latch input data through the dle signal. section 7 bus operation provides detailed information about the relationship of the data bus to bus operation, the multiplexed bus mode, and the data latch mode. refer to appendix a mc68lc040 and appendix b mc68ec040 for details concerning the cdis level and multiplexed bus mode. 5.3 transfer attribute signals the following paragraphs describe the transfer attribute signals, which provide additional information about the bus transfer. refer to section 7 bus operation for detailed information about the relationship of the transfer attribute signals to bus operation. 5.3.1 transfer type (tt1, tt0) the processor drives these three-state bidirectional signals to indicate the type of access for the current bus transfer. during bus transfers by an alternate bus master, the processor samples these signals to determine if it should snoop the transfer; only normal and move16 accesses can be snooped. table 5-2 lists the definition of the transfer-type encoding. the acknowledge access (tt1 = 1 and tt0 = 1) is used for both interrupt and breakpoint acknowledge transfers, and for lpstop broadcast cycles on the mc68040v and mc68ec040v. table 5-2. transfer-type encoding tt1 tt0 transfer type 0 0 normal access 0 1 move16 access 1 0 alternate logical function code access 1 1 acknowledge access f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
5- 6 m68040 user? manual motorola 5.3.2 transfer modifier (tm2?m0) these three-state outputs provide supplemental information for each transfer type. table 5-3 lists the encoding for normal and move16 transfers, and table 5-4 lists the encoding for alternate access transfers. for interrupt acknowledge transfers, the tmx signals carry the interrupt level being acknowledged; for breakpoint acknowledge transfers and lpstop broadcast cycles on the mc68040v and mc68ec040v, the tmx signals are low. when the m68040 is not the bus master, the tmx signals are set to a high-impedance state. table 5-3. normal and move16 access transfer modifier encoding tm2 tm1 tm0 transfer modifier 0 0 0 data cache push access 0 0 1 user data access* 0 1 0 user code access 0 1 1 mmu table search data access 1 0 0 mmu table search code access 1 0 1 supervisor data access* 1 1 0 supervisor code access 1 1 1 reserved * move16 accesses use only these encodings. table 5-4. alternate access transfer modifier encoding tm2 tm1 tm0 transfer modifier 0 0 0 logical function code 0 0 0 1 reserved 0 1 0 reserved 0 1 1 logical function code 3 1 0 0 logical function code 4 1 0 1 reserved 1 1 0 reserved 1 1 1 logical function code 7 5.3.3 transfer line number (tln1, tln0) these three-state outputs indicate which line in the set of four data cache lines is being accessed for normal push and line data read accesses. tlnx signals are undefined for all other accesses to instruction space and are placed in a high-impedance state when the processor relinquishes the bus. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 5- 7 the tlnx signals can be used in high-performance systems to build an external snoop filter with a duplicate set of cache tags. the tlnx signals and address bus provide a direct indication of the state of the data caches and can be used to help maintain the duplicate tag store. the tlnx pins do not indicate the correct tln number when an instruction cache burst fill occurs. 5.3.4 user-programmable attributes (upa1, upa0) the upax signals are three-state outputs. if they match the logical address, the user- programmable attribute bits in the address translation entry or the transparent translation register determine the upax signal level. these signals are only for normal code, data, and move16 accesses. for all other accesses, including table search and cache line push accesses, which may result from a normal access, the upax signals are zero. if the transparent translation register and the memory management unit are disabled, the upax signals are also zero. when the m68040 is not the bus master, these signals are set to a high-impedance state. 5.3.5 read/write (r/ w ) this bidirectional three-state signal defines the data transfer direction for the current bus cycle. a high level indicates a read cycle, and a low level indicates a write cycle. the bus snoop controller examines this signal when the processor is not the bus master. 5.3.6 transfer size (siz1, siz0) these bidirectional three-state signals indicate the data size for the bus transfer. the bus snoop controller examines this signal when the processor is not the bus master. refer to section 7 bus operation for more information on the encoding of these signals. 5.3.7 lock ( lock ) this three-state output indicates that the current transfer is part of a sequence of locked transfers for a read-modify-write operation. the external arbiter can use lock to prevent an alternate bus master from gaining control of the bus and accessing the same operand between processor accesses for the locked sequence of transfers. although lo ck indicates that the processor requests the bus be locked, the processor will give up the bus if the external arbiter negates the bg signal. when the m68040 is not the bus master, the lock signal is set to a high-impedance state. lock drives high before three-stating. refer to section 7 bus operation for information on locked transfers. 5.3.8 lock end ( locke ) this three-state output indicates that the current transfer is the last in a sequence of locked transfers for a read-modify-write operation. the external arbiter can use locke to support arbitration between unrelated locked transfer sequences while still maintaining the indivisible nature of each read-modify-write operation. when the m68040 is not the bus master, the locke signal is set to a high-impedance state. locke drives high before f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
5- 8 m68040 user? manual motorola three-stating. do not use locke if it is possible to retry the last write of a read-write- modify operation. 5.3.9 cache inhibit out ( ciout ) this three-state output reflects the state of the cache mode field in one of the address translation caches and is asserted for accesses to noncachable pages to indicate that an external cache should ignore the bus transfer. when the referenced logical address is within an area specified for transparent translation, the cache mode field of the appropriate transparent translation register controls the state of ciout . refer to section 3 memory management unit (except mc68ec040 and mc68ec040v) for more information about the address translation caches and transparent translation. when the m68040 is not the bus master, the ciout signal is set to a high-impedance state. 5.4 bus transfer control signals the following signals provide control functions for bus transfers. refer to section 7 bus operation for detailed information about the relationship of the bus transfer control signals to bus operation. 5.4.1 transfer start ( ts ) the processor asserts this three-state bidirectional signal for one clock period to indicate the start of each transfer. during alternate bus master accesses, the processor monitors this signal to detect the start of each transfer to be snooped. 5.4.2 transfer in progress ( tip ) this three-state output is asserted to indicate that a bus transfer is in progress and is negated during idle bus cycles if the bus is still granted to the processor. when the processor loses the bus, tip negates after completion of the current transfer and goes to a high-impedance state. note that tip is kept asserted on back-to-back bus cycles. 5.4.3 transfer acknowledge ( ta ) this three-state bidirectional signal indicates the completion of a requested data transfer operation. during transfers by the m68040, ta is an input signal from the referenced slave device indicating completion of the transfer. during alternate bus master accesses, ta is normally three-stated to allow the referenced slave device to respond, and the m68040 samples it to detect the completion of each bus transfer. the m68040 can inhibit memory and intervene in the access to source or sink data in its internal caches by asserting ta to acknowledge the data transfer. this capability applies to alternate bus master accesses that reference modified (dirty) data in the m68040 caches. 5.4.4 transfer error acknowledge ( tea ) the current slave asserts this input signal to indicate an error condition for the bus transaction. when asserted with ta , this signal indicates that the processor should retry f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 5- 9 the access. during alternate bus master accesses, the m68040 samples tea to detect completion of each bus transfer. 5.4.5 transfer cache inhibit ( tci ) this input signal inhibits read data from being loaded into the m68040 instruction or data caches. tci is ignored during all writes and after the first data transfer for both burst line reads and burst-inhibited line reads. tci is also ignored during all alternate bus master transfers. 5.4.6 transfer burst inhibit ( tbi ) this input signal indicates to the processor that the accessed device cannot support burst mode accesses and that the requested line transfer should be divided into individual long- word transfers. asserting tbi with ta terminates the first data transfer of a line access, which causes the processor to terminate the burst and access the remaining data for the line as three successive long-word transfers. during alternate bus master accesses, the m68040 samples the tbi to detect completion of each bus transfer. 5.5 snoop control signals the following signals control the operation of the m68040 on-chip snoop logic. section 4 instruction and data caches provides information about the relationship of the snoop control signals to the caches, and section 7 bus operation discusses the relationship of these signals to bus operation. 5.5.1 snoop control (sc1, sc0) these input signals specify the snoop operation to be performed by the m68040 for an alternate bus master transfer. if the m68040 is allowed to snoop an alternate bus master read transfer, it can intervene in the access to supply data from its data cache when the memory copy is stale, ensuring that the alternate bus master receives valid data. writes by an alternate bus master can also be snooped to either update the m68040 internal data cache with the new data or invalidate the matching cache lines, ensuring that subsequent m68040 reads access valid data. these signals are ignored when the processor is the bus master. 5.5.2 memory inhibit ( mi ) this output signal prevents an alternate bus master from accessing possibly stale data in memory while the m68040 is unable to respond. mi is asserted during reset preventing external memory from responding. when the scx signals indicate an access should be snooped, the m68040 keeps mi asserted until it determines if intervention in the access is required. if no intervention is required, mi is negated and memory is allowed to respond to complete the access. otherwise, mi remains asserted and the m68040 completes the transfer as a slave. it updates its caches on a write or supplies data to the alternate bus master on a read. mi is negated when the m68040 is the bus master. during a snoop f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
5- 10 m68040 user? manual motorola cycle, the m68040 ignores all ta and tea assertions while mi is asserted; when rsti is asserted, mi is asserted. 5.6 arbitration signals the following control signals support requests to an external arbiter to become the bus master. refer to section 7 bus operation for detailed information about the relationship of the arbitration signals to bus operation. 5.6.1 bus request ( br ) this output signal indicates to the external arbiter that the processor needs to become bus master for one or more bus transfers. br is negated when the m68040 begins an access to the external bus with no other accesses pending, and br remains negated until another access is required. there are some situations in which the m68040 asserts br and then negates it without having run bus transfers; this is a disregard request condition. refer to section 7 bus operation for details about this state. 5.6.2 bus grant ( bg ) this input signal from an external arbiter indicates that the bus is available to the m68040 as soon as the current bus access completes. bg must be asserted and bb must be negated (indicating the bus is free) before the m68040 assumes ownership of the bus. 5.6.3 bus busy ( bb ) this three-state bidirectional signal indicates that the bus is currently owned. bb is monitored as a processor input to determine when a alternate bus master has released control of the bus. bg must be asserted and bb must be negated (indicating the bus is free) before the m68040 asserts bb as an output to assume ownership of the bus. the processor keeps bb asserted until the external arbiter negates bg and the processor completes the bus transfer in progress. when releasing the bus, the processor negates bb , then sets it to a high-impedance state for use again as an input. 5.7 processor control signals the following signals control disabling caches and memory management units (mmus) and support processor and external device initialization. 5.7.1 cache disable ( cdis ) cdis dynamically disables the on-chip caches on the next internal cache access boundary. cdis does not flush the data and instruction caches; entries remain unaltered and become available after cdis is negated. the assertion of cdis does not affect snooping. during a processor reset, the level on cdis is latched and used to select the normal bus mode ( cdis high) or multiplexed bus mode ( cdis low). refer to section 4 instruction and data caches for information about the caches and to section 7 bus operation for information about the multiplexed bus mode. refer to appendix e f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 5- 11 mc68040 floating-point emulation (mc68040fpsp) for descriptions of emulator use of this signal. 5.7.2 reset in ( rsti ) this input signal causes the m68040 to enter reset exception processing. the rsti signal is an asynchronous input that is internally synchronized to the next rising edge of the bclk signal. all three-state signals are set to the high-impedance state, and all outputs, except mi , are negated when rsti is recognized. the assertion of rsti does not affect the test pins. refer to section 7 bus operation for a description of reset operation and to section 8 exception processing for information about the reset exception. 5.7.3 reset out ( rsto ) the m68040 asserts this output during execution of the reset instruction to initialize external devices. refer to section 7 bus operation for a description of reset out bus operation. 5.8 interrupt control signals the following signals control the interrupt functions. 5.8.1 interrupt priority level ( ipl2 e ipl0 ) these input signals provide an indication of an interrupt condition and the encoding of the interrupt level from a peripheral or external prioritizing circuitry. ipl2 is the most significant bit of the level number. for example, since the ipl? signals are active low, ipl2 ipl0 = $5 corresponds to an interrupt request at interrupt priority level 2. during a processor reset, the levels on the ipl? lines are latched and used to select the output driver characteristics for three signal groups listed in table 5-5. refer to section 8 exception processing for information on interrupts and to section 11 mc68040 electrical and thermal characteristics for information on driver characteristics. refer to appendix a mc68lc040 and appendix b mc68ec040 for how these signals are different on power-up. table 5-5. output driver control groups signal output buffers controlled ipl2 data-bus: d31?0 ipl1 address bus and transfer attributes: a31?0, ciout , lock , locke , r/ w , siz1?iz0, tln1?ln0, tm2?m0, tt1?t0, upa1?pa0 ipl0 miscellaneous control signals: bb , br , ipend , mi , pst3?st0, rsto , ta , tdo, tip , ts note: high input level = small buffers enabled; low input level = large buffers enabled. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
5- 12 m68040 user? manual motorola 5.8.2 interrupt pending status ( ipend ) this output signal indicates that an interrupt request has been recognized internally and exceeds the current interrupt priority mask in the status register (sr). external devices (other bus masters) can use ipend to predict processor operation on the next instruction boundaries. ipend is not intended for use as an interrupt acknowledge to external peripheral devices. refer to section 7 bus operation for bus information related to interrupts and to section 8 exception processing for interrupt information. 5.8.3 autovector ( avec ) this input signal is asserted with ta during an interrupt acknowledge transfer to request internal generation of the vector number. refer to section 7 bus operation for more information about automatic vectors. 5.9 status and clock signals the following paragraphs explain the signals that provide timing, test control, and the internal processor status. 5.9.1 processor status (pst3?st0) these outputs indicate the internal execution unit? status. the timing is synchronous with bclk, and the status may have nothing to do with the current bus transfer. the pstx signal is updated depending on the type of pstx encoding. there are two classes of pstx encodings. the first class is associated with instruction boundaries, and the second class indicates the processor? present status. table 5-6 lists the definition of the encodings. the encodings 0, 8, 4, 5, c, d, e, and f indicate the present status and do not reflect a specific stage of the pipe. these encodings persist as long as the processor stays in the indicated state. the default encoding 0 (user) or 8 (supervisor) is indicated if none of the above conditions apply. the encodings 1, 2, 3, 9, a, and b belong to the first class of pstx encoding. this class indicates that the instruction is in its last instruction execution stage. these encodings exist for only one bclk period per instruction and are mutually exclusive. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 5- 13 table 5-6. processor status encoding hex pst3 pst2 pst1 pst0 internal status 00000 user, start/continue current instruction 10001 user, end current instruction 20010 user, branch not taken/end current instruction 30011 user, branch taken/end current instruction 40100 user, table search 50101 halted state (double bus fault) 60110 low-power stop mode (supervisor instruction)* 70111 reserved 81000 supervisor, start/continue current instruction 91001 supervisor, end current instruction a1010 supervisor, branch not taken/end current instruction b1011 supervisor, branch taken/end current instruction c1100 supervisor, table search d1101 stopped state (supervisor instruction) e1110 rte executing f1111 exception stacking note: *mc68040v and mc68ec040v only. when a ?ranch taken/end current instruction?is indicated, it means that a change of instruction flow is pending. along with the following instructions, an exception stacking (encoding f) sequence is ended with the ?upervisor, branch taken/end current instruction encoding as though it were a virtual jmp instruction. this includes all the possible exceptions listed in the processor? vector table. instructions that cause a ?ranch taken/end current instruction?encoding when they are executed are as follows: andi to sr dbcc (taken) move to sr rtd bcc (taken) fbcc (taken) move usp rte bra fdbcc (always) movec rtr bsr fmovem rc,mrn moves rts cas fmovem fpm,mrn nop stop cas2 fsave ori to sr tas cinv jmp pflush cpush jsr ptest the bcc (not taken) and dbcc (not taken) are the only instructions that cause a ?ranch not taken/end current instruction?encoding. note that the fbcc (not taken) is not included in this category. the fbcc (not taken) instruction ends with an ?nd current instruction encoding. all other instructions and conditions end with the ?nd current instruction encoding. for instance, if the processor is running back-to-back single clock instructions, the encoding ?nd current instruction?remains asserted for as many clock cycles as instructions. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
5- 14 m68040 user? manual motorola the following examples are for pstx encodings: 1. an access error terminates an instruction such that the instruction execution stage is not reached. in this case, an ?nd current instruction?is not indicated. exception processing starts, the exception stacking status is indicated, and then the virtual jmp causes the ?upervisor, branch taken/end current instruction?encoding. 2. an ftrapcc that does not take an exception ending with the ?nd current instruction?encoding. the exception stacking status is indicated and then reaches the ?upervisor, branch taken/end current instruction?encoding if the ftrapcc ends in an exception. 3. two simultaneous interrupt exception processing sequences follow an add instruction. the add instruction ends with ?nd current instruction? followed by exception stacking, followed by ?ranch taken/end current instruction? followed by exception stacking, followed by ?ranch taken/end current instruction? 4. an rte instruction follows an add instruction. the ?nd current instruction?is followed by rte executing followed by a branch taken/end current instruction. 5.9.2 bus clock (bclk) this input signal is used as a reference for all bus timing. it is a ttl-compatible signal and cannot be gated off. refer to section 11 mc68040 electrical and thermal characteristics for electrical specifications. 5.9.3 processor clock (pclk)?ot on mc68040v and mc68ec040v pclk is used to derive all internal timing. this clock is also ttl compatible and cannot be gated off. refer to section 11 mc68040 electrical and thermal characteristics for electrical specifications. 5.10 mmu disable ( mdis )?not on mc68ec040 the mmu disable signal dynamically disables the translation of addresses by the mmus. the assertion of mdis does not flush the address translation caches (atcs); atc entries become available again when mdis is negated. during a processor reset, the level on mdis is latched and used to select the normal data latch mode ( mdis high) or dle mode ( mdis low). refer to section 3 memory management unit (except mc68ec040 and mc68ec040v) for a description of address translation and to section 7 bus operation for information about dle mode. 5.11 data latch enable (dle)?nly on mc68040 this input signal is used in dle mode to latch the input data bus on read transfers. dle mode can be used to support asynchronous memory interfaces by allowing the interface to specify when data should be latched instead of requiring data to be valid on the rising edge of bclk. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 5- 15 5.12 test signals the m68040 includes dedicated user-accessible test logic that is fully compatible with the ieee 1149.1 standard test access port and boundary scan architecture . problems associated with testing high-density circuit boards have led to the development of this standard under the ieee test technology committee and joint test action group (jtag) sponsorship. the m68040 implementation supports circuit board test strategies based on this standard. however, the jtag interface is not intended to provide an in-circuit test to verify m68040 operations; therefore, it is impossible to test m68040 operations using this interface. section 6 ieee 1149.1 test access port (jtag) describes the m68040 implementation of the ieee 1149.1 and is intended to be used with the supporting ieee document. 5.12.1 test clock (tck) this input signal is used as a dedicated clock for the test logic. since clocking of the test logic is independent of the normal operation of the mc68040, several other components on a board can share a common test clock with the processor even though each component may operate from a different system clock. the design of the test logic allows the test clock to run at low frequencies, or to be gated off entirely as required for test purposes. 5.12.2 test mode select (tms) this input signal is decoded by the tap controller and distinguishes the principle operationas of the test support circuitry. 5.12.3 test data in (tdi) this input signal provides a serial data input to the tap. 5.12.4 test data out (tdo) this three-state output signal provides a serial data output from the tap. the tdo output can be placed in a high-impedance mode to allow parallel connection of board-level test data paths. 5.12.5 test reset ( trst )?not on mc68040v and mc68ec040v this input signal provides an asynchronous reset of the tap controller. 5.13 power supply connections the m68040 requires connection to a v cc power supply, positive with respect to ground. the v cc and ground connections are grouped to supply adequate current to the various sections of the processor. section 12 ordering information and mechanical data describes the groupings of v cc and ground connections. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
5- 16 m68040 user? manual motorola 5.14 signal summary table 5-7 provides a summary of the electrical characteristics of the signals discussed in this section. table 5-7. signal summary signal name mnemonic type active three-state address bus a31?0 input/output high yes autovector avec input low bus busy bb input/output low yes bus clock bclk input bus grant bg input low bus request br output low no cache disable cdis input low cache inhibit out ciout output low yes data bus d31?0 input/output high yes data latch enable 1 dle input high ground gnd ground interrupt pending ipend output low no interrupt priority level 2 ipl2 ipl0 input low bus lock lock output low yes bus lock end locke output low yes memory inhibit mi output low no mmu disable 3 mdis input low processor clock pclk input processor status pst3?st0 output high no read/write r/ w input/output high/low yes reset in rsti input low reset out rsto output low no snoop control sc1, sc0 input high transfer acknowledge ta input/output low yes transfer burst inhibit tbi input low transfer cache inhibit tci input low transfer error acknowledge tea input low transfer in progress tip output low yes transfer line number tln1, tln0 output high yes transfer modifier tm2?m0 output high yes transfer size siz1, siz0 input/output high yes f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 5- 17 table 5-7 signal summary (continued) signal name mnemonic type active three-state transfer start ts input/output low yes transfer type tt1, tt0 input/output high yes test clock tck input test data input tdi input high test data output tdo output high yes test mode select tms input high test reset trst input low user-programmable attributes upa1, upa0 output high yes power supply v cc power notes: 1. this signal is not available on the mc68lc040 and mc68ec040. 2. these signals are different on power-up for the mc68lc040 and mc68ec040. 3. this signal is not available on the mc68ec040. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 6- 1 section 6 ieee 1149.1a test access port (jtag) note this section does not apply to the mc68040v and mc68ec040v. refer to appendix c mc68040v and mc68ec040 for details. all references to m68040 in this section only, refer to the mc68040, mc68lc040, and mc68ec040. the m68040 includes dedicated user-accessible test logic that is fully compatible with the ieee standard 1149.1a standard test access port and boundary scan architecture . problems associated with testing high-density circuit boards have led to the standard? development under the sponsorship of the ieee test technology committee and the joint test action group ( jtag). this section is to be used in conjunction with the supporting ieee document and includes those chip-specific items that the ieee standard requires to be defined and additional information specific to the m68040 implementation. for example, the ieee standard 1149.1a test access port (tap) controller states are referenced in this section but are not described. for these details and application information regarding the standard, refer to the ieee standard 1149.1a document. the m68040 implementation supports circuit board test strategies based on the standard. the test logic utilizes static logic design and is system logic independent of the device. the m68040 implementation provides capabilities to: a. perform boundary scan operations to test circuit board electrical continuity, b. bypass the m68040 by reducing the shift register path to a single cell, c. sample the m68040 system pins during operation and transparently shift out the result, d. disable the output drive to output-only pins during circuit board testing, and e. select one of two output drivers on a pin-by-pin basis. note the ieee standard 1149.1a test logic cannot be considered completely benign to those planning not to use this capability. certain precautions must be observed to ensure that this logic does not interfere with system operation. refer to 6.5 disabling the ieee standard 1149.1a operation . f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
6- 2 m68040 user? manual motorola 6.1 overview figure 6-1 illustrates a block diagram of the m68040 implementation of ieee standard 1149.1a. the test logic includes a 16-state dedicated tap controller. these 16 controller states are defined in detail in the ieee standard 1149.1a, but only 8 are included in this section. test-logic-reset run-test/idle capture-ir capture-dr update-ir update-dr shift-ir shift-dr the tap controller provides access to five dedicated signal pins: tck? test clock input that synchronizes the test logic. tms? test mode select input with an internal pullup resistor sampled on the rising edge of tck to sequence the tap controller. tdi? test data input with an internal pullup resistor sampled on the rising edge of tck. tdo? three-state test data output actively driven only in the shift-ir and shift-dr controller states that changes on the falling edge of tck. trst ?n active-low asynchronous reset with an internal pullup resistor that forces the tap controller into the test-logic-reset state. the test logic also includes an instruction shift register and two test data registers, a boundary scan register and a bypass register. the boundary scan register links all device signal pins into the instruction shift register. tdi tdo tms tck trst 3-bit instruction shift register
latched decoder 184-bit boundary scan register test data registers bypass mux tap
controller mux 183 0 0 2 figure 6-1. m68040 test logic block diagram f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 6- 3 6.2 instruction shift register the m68040 ieee standard 1149.1a implementation includes a 3-bit instruction shift register without parity. the register shifts one of eight instructions, which can either select the test to be performed or access a test data register, or both. data is transferred from the instruction shift register to latched decoded outputs during the update-ir state. the instruction shift register is reset to all ones in the tap controller test-logic-reset state, which is equivalent to selecting the bypass instruction. during the capture-ir state, the binary value 001 is loaded into the parallel inputs of the instruction shift register. the m68040 ieee standard 1149.1a implementation includes three mandatory public instructions (bypass, sample/preload, and extest) and four manufacturer's public instructions. the four manufacturer? public instructions provide the capability to disable all device output drivers, operate the device in a bypass configuration without a system clocking requirement, and select one of two output drive capabilities on a pin-by-pin basis. the m68040 implementation does not support the optional standard public instructions. table 6-1 lists the three bits used in the instruction shift register to decode the instructions and their related encodings. note that the least significant bit of the instruction (bit 0) is the first bit to be shifted into the instruction shift register. table 6-1. ieee standard 1149.1a instructions bit 2 bit 1 bit 0 instruction selected test data register accessed 0 0 0 extest boundary scan 0 0 1 highz bypass 0 1 0 sample/preload boundary scan 0 1 1 drvctl.t boundary scan 1 0 0 shutdown bypass 1 0 1 private bypass 1 1 0 drvctl.s boundary scan 1 1 1 bypass bypass extest, highz, drvctl.t, shutdown, and private have a pclk and bclk restriction. failure to comply with this restriction results in potential internal damage to the device (see 6.4 restrictions ). once the restriction is complied with, shutdown, extest, highz, and drvctl.t can be entered regardless of order. the system clocks (pclk and bclk) must be kept running while in the sample/preload, drvclt.s, and bypass instructions. failure to do so could result in potential internal damage to the device. 6.2.1 extest the external test instruction (extest) selects the 184-bit boundary scan register. this instruction also activates two internal functions that are intended to protect the device from potential damage while performing boundary scan operations. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
6- 4 m68040 user? manual motorola extest asserts internal reset for the m68040 system logic to force a predictable benign internal state and activates an internal keep-alive clock to protect the device from potential internal damage. this internal clock eliminates the requirement to keep the system clocks (pclk and bclk) running during extest operations and allows these two system clock pins to be included in boundary scan testing. 6.2.2 highz the highz instruction is an optional instruction provided as a motorola public instruction to anticipate the need to backdrive output pins during circuit board testing. the highz instruction activates an internal keep-alive clock, asserts internal system reset, selects the bypass register, and forces all output and bidirectional pins to the high-impedance state. asserting trst or holding tms high and clocking tck for at least five rising edges causes the tap controller to enter the test-logic-reset state. using only the tms and tck pins and the capture-ir and update-ir states invokes the highz instruction. this scheme works because the value captured by the instruction shift register during the capture-ir state is identical to the highz opcode. 6.2.3 sample/preload the sample/preload instruction provides two separate functions. first, it provides a means to obtain a sample system data and control signal. sampling occurs on the rising edge of tck in the capture-dr state. the user can observe the data by shifting it through the boundary scan register to output tdo using the shift-dr state. both the data capture and the shift operations are transparent to system operation. the user must provide some form of external synchronization to achieve meaningful results since there is no internal synchronization between tck and bclk. the second function of the sample/preload instruction is to initialize the boundary scan register output cells before selecting extest, which is accomplished by ignoring data being shifted out of tdo while shifting in initialization data. the update-dr state can then be used to initialize the boundary scan register and ensure that known data and output state will occur on the outputs after entering the extest instruction. 6.2.4 drvctl.t the drvctl.t instruction is a motorola public instruction that provides the ability to select one of two output drivers on a pin-by-pin basis. it is intended for use with extest or shutdown to provide an ieee-compatible environment to select the output drivers for board-level test environments. this instruction allows data in the boundary scan register to select the output driver. a logic zero in the appropriate boundary scan output cell (see table 6-1) selects the large buffer, and a logic one selects the small buffer (see section 7 bus operation ). data captured in the capture-dr state for this instruction is identical to that captured during extest: output data cells for outputs and pin state for inputs. note that no data relevant to the drive control function is captured during the capture-dr state. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 6- 5 the drvctl.t instruction is intended to be used in test applications in conjunction with the extest and shutdown instructions and not for system applications. it therefore differs from drvctl.s in that this instruction invokes the keep-alive clock, asserts the internal reset, and the test logic, not the system logic, has control of the i/o pins. when the system logic has control of signal pin i/o directions and levels, the drive control latch is loaded from the ipl2 ipl0 pins during the negation of rsti . drvctl.t overwrites this value with boundary scan data in the update-dr state. the selected output driver state remains unchanged if only the drvctl.t, extest, or shutdown instructions are invoked. if an instruction other than one of these three is executed, the system logic protocol regains control of the output driver state and overwrites the value that the drvctl.t instruction previously defined. note that the output drive control state does not change while the 1149.1a instruction is one of the three instructions drvctl.t, extest, or shutdown. if drvctl.t changes the output driver state and then the test-logic-reset state is entered, the instruction shift register is reset to bypass, and the system logic can change the output driver state. 6.2.5 shutdown this instruction provides an opcode for automatic test pattern generation (atpg) programs to cope with the clocking protocol required to stop the system clocks. this instruction asserts internal system reset, activates an internal keep alive clock, and selects the bypass register. internal decoding of the instruction selects the bypass register, and the test logic, not the system logic, has control of the i/o ports. note that initializing the boundary scan data register and then selecting the shutdown instruction provides a clamping function. the test logic controls the i/o state, and the bypass register is selected. 6.2.6 private motorola reserves this instruction for manufacturing use. the instruction does not change pin i/o as defined for system operation. 6.2.7 drvctl.s the drvctl.s instruction controls the output driver selection on a pin-by-pin basis. this instruction allows data in the boundary scan register to select the output driver during the update-dr state when the system logic has control of the signal i/o directions and levels. a logic zero selects the large buffer or driver; a logic one selects the small buffer or driver (see table 6-1). the drvctl.s instruction is intended to be used in system applications and not in test applications. in system applications, the system logic has control of the signal pin i/o directions and levels; whereas, in test applications, the 1149.1a test logic has control of it. it therefore differs from drvctl.t in that this instruction does not invoke the internal keep alive clock, it does not assert the internal reset, and the system logic, not the test logic, f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
6- 6 m68040 user? manual motorola has control of the i/o pins. the 1149.1a interface is transparent to system operation except for drive control selection during execution of this instruction. when the system logic has control of the signal i/o directions and levels, the drive control latches are loaded from the ipl2 ipl0 pins at the negation of the rsti signal. after rsti has been negated, and the 128-clock internal reset cycle has expired (see section 7 bus operation ), the drvctl.s instruction is executed. each drive control latch is modified during the update-dr state. any subsequent rsti signal negation while in a system configuration (i.e., system logic has control of the signal i/o directions and levels) can cause the drive control latches to be overwritten with new ipl? signal values. the system bus can be suspended in a wait state while this function is being performed. 6.2.8 bypass the bypass instruction selects the single-bit bypass register, creating a single-bit shift- register path from tdi to the bypass register to tdo. the instruction enhances test efficiency when a component other than the m68040 becomes the device under test. when the bypass register is initially selected, the instruction shift register stage is set to a logic zero on the rising edge of tck following entry into the capture-dr state. therefore, the first bit to be shifted out after selecting the bypass register is always a logic zero. figure 6-2 illustrates the bypass register. 1 mux 1 g1 1d c1 clock dr from tdi 0 shift dr to tdo figure 6-2. bypass register 6.3 boundary scan register the 184-bit boundary scan register uses the tap controller to scan user-defined values into the output buffers, capture values presented to input pins, and control the direction of bidirectional pins. the instruction shift register cell nearest tdo (i.e., first to be shifted out) is defined as bit zero. the last bit to be shifted out is bit 183. this register includes cells for all device signal pins and clock pins along with associated control signals. the m68040 boundary scan register consists of three cell structure types, o.latch, i.pin, and io.ctl, that are associated with a boundary scan register bit. all boundary scan output cells capture the logic level of the device output latch during the capture-dr state. figures 6-3 through 6-5 illustrate these three cell types. figure 6-6 illustrates the general arrangement of these cells. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 6- 7 data from
system logic from
last
cell clock dr update dr2
(drvctl.x) shift dr to next cell to output
buffer 1 = extest, drvctl.t,
and shutdown
0 = otherwise 1 mux 1 g1 1d c1 update dr1
(drvctl.x) to output
driver select 1d c1 1d c1 1 mux 1 g1 figure 6-3. output latch cell (o.latch) from
last
cell to
system
logic shift dr clock dr to next cell 1d c1 1 mux 1 g1 input
pin figure 6-4. input pin cell (i.pin) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
6- 8 m68040 user? manual motorola output control
from system logic from
last
cell clock dr shift dr to next cell to output
buffer
(1 = drive) 1 = extest
0 = otherwise 1d c1 1 mux 1 g1 1 mux 1 g1 update dr 1d c1 r reset figure 6-5. output control cells (io.ctl) from
last cell output
data input
data output
enable to next cell to next
pin pair i/o.ctl o.latch i.pin en input
pin figure 6-6. general arrangement of bidirectional pins f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 6- 9 all m68040 bidirectional pins include two boundary scan data cells, an input, and an output. one of five associated boundary scan control cells controls each bidirectional pin. if these cells contain a logic one, the associated bidirectional or three-state pin will be configured as an output and enabled. the cell captures the current value during the capture-dr state. all five control cells are reset (i.e., logic zero) in the test-logic-reset state. the five bidirectional/three-state control cells and their boundary scan register bit positions are as follows: cell name bit io.ab 150 io.db 151 io.2 154 io.1 155 io.0 156 table 6-2 lists the 184 boundary scan bit definitions. the first column in the table defines the bit position in the boundary scan register. the second column references one of the three cell types. the third column lists the pin name for all pin-related cells. the fourth column lists the system pin type for convenience where ts-output indicates a three-state output pin and i/o indicates a bidirectional pin. the last column lists the name of the associated control bit of the boundary scan register for three-state output and bidirectional pins. the boundary scan description language (bsdl) type for each cell can be found in note 1. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
6- 10 m68040 user? manual motorola table 6-2. boundary scan bit definitions 1 bit cell type pin/cell name pin type output ctrl cell 0 o.latch rsto output 2 (note 3) 1 o.latch ipend output 2 (note 3) 2 o.latch ciout ts-output 2 io.0 3 o.latch upa0 ts-output 2 io.0 4 o.latch upa1 ts-output 2 io.0 5 o.latch tt0 i/o 2 io.0 6 i.pin tt0 i/o io.0 7 o.latch tt1 i/o 2 io.0 8 i.pin tt1 i/o io.0 9 o.latch a10 i/o 2 io.ab 10 i.pin a10 i/o io.ab 11 o.latch a11 i/o 2 io.ab 12 i.pin a11 i/o io.ab 13 o.latch a12 i/o 2 io.ab 14 i.pin a12 i/o io.ab 15 o.latch a13 i/o 2 io.ab 16 i.pin a13 i/o io.ab 17 o.latch a14 i/o 2 io.ab 18 i.pin a14 i/o io.ab 19 o.latch a15 i/o 2 io.ab 20 i.pin a15 i/o io.ab 21 o.latch a16 i/o 2 io.ab 22 i.pin a16 i/o io.ab 23 o.latch a17 i/o 2 io.ab 24 i.pin a17 i/o io.ab 25 o.latch a18 i/o 2 io.ab 26 i.pin a18 i/o io.ab 27 o.latch a19 i/o 2 io.ab 28 i.pin a19 i/o io.ab 29 o.latch a20 i/o 2 io.ab 30 i.pin a20 i/o io.ab 31 o.latch a21 i/o 2 io.ab 32 i.pin a21 i/o io.ab 33 o.latch a22 i/o 2 io.ab 34 i.pin a22 i/o io.ab 35 o.latch a23 i/o 2 io.ab 36 i.pin a23 i/o io.ab bit cell type pin/cell name pin type output ctrl cell 37 o.latch a24 i/o 2 io.ab 38 i.pin a24 i/o io.ab 39 o.latch a25 i/o 2 io.ab 40 i.pin a25 i/o io.ab 41 o.latch a26 i/o 2 io.ab 42 i.pin a26 i/o io.ab 43 o.latch a27 i/o 2 io.ab 44 i.pin a27 i/o io.ab 45 o.latch a28 i/o 2 io.ab 46 i.pin a28 i/o io.ab 47 o.latch a29 i/o 2 io.ab 48 i.pin a29 i/o io.ab 49 o.latch a30 i/o 2 io.ab 50 i.pin a30 i/o io.ab 51 o.latch a31 i/o 2 io.ab 52 i.pin a31 i/o io.ab 53 o.latch d0 i/o 2 io.db 54 o.latch d1 i/o 2 io.db 55 o.latch d2 i/o 2 io.db 56 o.latch d3 i/o 2 io.db 57 o.latch d4 i/o 2 io.db 58 o.latch d5 i/o 2 io.db 59 o.latch d6 i/o 2 io.db 60 o.latch d7 i/o 2 io.db 61 o.latch d8 i/o 2 io.db 62 o.latch d9 i/o 2 io.db 63 o.latch d10 i/o 2 io.db 64 o.latch d11 i/o 2 io.db 65 o.latch d12 i/o 2 io.db 66 o.latch d13 i/o 2 io.db 67 o.latch d14 i/o 2 io.db 68 o.latch d15 i/o 2 io.db 69 o.latch d16 i/o 2 io.db 70 o.latch d17 i/o 2 io.db 71 o.latch d18 i/o 2 io.db 72 o.latch d19 i/o 2 io.db 73 o.latch d20 i/o 2 io.db f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 6- 11 table 6-2. boundary scan bit definitions (continued) bit cell type pin/cell name pin type output ctrl cell 74 o.latch d21 i/o 2 io.db 75 o.latch d22 i/o 2 io.db 76 o.latch d23 i/o 2 io.db 77 o.latch d24 i/o 2 io.db 78 o.latch d25 i/o 2 io.db 79 o.latch d26 i/o 2 io.db 80 o.latch d27 i/o 2 io.db 81 o.latch d28 i/o 2 io.db 82 o.latch d29 i/o 2 io.db 83 o.latch d30 i/o 2 io.db 84 o.latch d31 i/o 2 io.db 85 i.pin d0 i/o io.db 86 i.pin d1 i/o io.db 87 i.pin d2 i/o io.db 88 i.pin d3 i/o io.db 89 i.pin d4 i/o io.db 90 i.pin d5 i/o io.db 91 i.pin d6 i/o io.db 92 i.pin d7 i/o io.db 93 i.pin d8 i/o io.db 94 i.pin d9 i/o io.db 95 i.pin d10 i/o io.db 96 i.pin d11 i/o io.db 97 i.pin d12 i/o io.db 98 i.pin d13 i/o io.db 99 i.pin d14 i/o io.db 100 i.pin d15 i/o io.db 101 i.pin d16 i/o io.db 102 i.pin d17 i/o io.db 103 i.pin d18 i/o io.db 104 i.pin d19 i/o io.db 105 i.pin d20 i/o io.db 106 i.pin d21 i/o io.db 107 i.pin d22 i/o io.db 108 i.pin d23 i/o io.db 109 i.pin d24 i/o io.db 110 i.pin d25 i/o io.db bit cell type pin/cell name pin type output ctrl cell 111 i.pin d26 i/o io.db 112 i.pin d27 i/o io.db 113 i.pin d28 i/o io.db 114 i.pin d29 i/o io.db 115 i.pin d30 i/o io.db 116 i.pin d31 i/o io.db 117 o.latch a9 i/o 2 io.ab 118 i.pin a9 i/o io.ab 119 o.latch a8 i/o 2 io.ab 120 i.pin a8 i/o io.ab 121 o.latch a7 i/o 2 io.ab 122 i.pin a7 i/o io.ab 123 o.latch a6 i/o 2 io.ab 124 i.pin a6 i/o io.ab 125 o.latch a5 i/o 2 io.ab 126 i.pin a5 i/o io.ab 127 o.latch a4 i/o 2 io.ab 128 i.pin a4 i/o io.ab 129 o.latch a3 i/o 2 io.ab 130 i.pin a3 i/o io.ab 131 o.latch a2 i/o 2 io.ab 132 i.pin a2 i/o io.ab 133 o.latch a1 i/o 2 io.ab 134 i.pin a1 i/o io.ab 135 o.latch a0 i/o 2 io.ab 136 i.pin a0 i/o io.ab 137 o.latch tm2 ts-output 2 io.0 138 o.latch tm1 ts-output 2 io.0 139 o.latch tm0 ts-output 2 io.0 140 o.latch tln1 ts-output 2 io.0 141 o.latch tln0 ts-output 2 io.0 142 o.latch siz0 i/o 2 io.0 143 i.pin siz0 i/o io.0 144 o.latch r/ w i/o 2 io.0 145 i.pin r/ w i/o io.0 146 o.latch locke ts-output 2 io.1 147 o.latch siz1 i/o 2 io.0 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
6- 12 m68040 user? manual motorola table 6-2. boundary scan bit definitions (concluded) bit cell type pin/cell name pin type output ctrl cell 148 i.pin siz1 i/o io.0 149 o.latch lock ts-output 2 io.1 150 io.ctl io.ab (note 4) 151 io.ctl io.db (note 4) 152 o.latch mi output 2 (note 3) 153 o.latch br output 2 (note 3) 154 io.ctl io.2 (note 4) 155 io.ctl io.1 (note 4) 156 io.ctl io.0 (note 4) 157 o.latch ts i/o 2 io.0 158 i.pin ts i/o io.0 159 o.latch bb i/o 2 io.1 160 i.pin bb i/o io.1 161 o.latch tip ts-output 2 io.1 162 o.latch pst3 output 2 (note 3) 163 o.latch pst2 output 2 (note 3) 164 o.latch pst1 output 2 (note 3) 165 o.latch pst0 output 2 (note 3) bit cell type pin/cell name pin type output ctrl cell 166 o.latch ta i/o 2 io.2 167 i.pin ta i/o io.2 168 i.pin tea input 169 i.pin bg input 170 i.pin sc1 input 171 i.pin sc0 input 172 i.pin tbi input 173 i.pin avec input 174 i.pin tci input 175 i.pin dle 5 input 176 i.pin pclk input 177 i.pin bclk input 178 i.pin ipl0 input 179 i.pin ipl1 input 180 i.pin ipl2 input 181 i.pin rsti input 182 i.pin cdis input 183 i.pin mdis 6 input notes: 1. i.pin, io.ctl, and o.latch are equivalent to the bsdl descriptions: bc_4, bc_2, and bc_2, respectively. 2. boundary scan register bit positions that are used during the drive control (drvctl.x) instructions. 3. these output-only cells can be turned off (high impedance) by using the highz instruction. 4. all of the control signals (io.ctl) are cleared in the test-logic-reset state. 5. renamed js0 on the mc68lc040 and mc68ec040. 6. renamed js1 on the mc68ec040. 6.4 restrictions the test logic is implemented using static logic design, and tck can be stopped in either a high or low state without loss of data. the system logic, however, includes considerable dynamic logic. for this reason, the system clocks (pclk and bclk) cannot be stopped or allowed to run slower than the specified frequency except when the extest, highz, drvctl.t, or shutdown instructions have been properly invoked. pclk and bclk must be kept running for two additional bclk periods upon initial entry into any of the four instructions, extest, highz, drvctl.t, or shutdown. this restriction is necessary to allow time for an internal reset to propagate through an internal synchronizer. after this period, the user has complete time-domain freedom with the two system clock pins. after any of the four instructions has been properly entered, these instructions can be executed in any order without a time-domain clocking restriction. entering any instruction other than one of these four requires that the system clocks be f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 6- 13 restarted, and a proper reentry into any of the four instructions is again required before the system clocks can be stopped. control over the output enable signals using the boundary scan register and the extest and highz instructions requires a compatible circuit-board test environment to avoid destructive configurations. the user is responsible for avoiding situations in which the m68040 output drivers are enabled into actively driven networks. the trst signal provides the ability for an asynchronous reset of the test logic and requires no internal clocking to force the tap controller into the test-logic-reset state. this signal should be asserted during system power-up to initialize the 1149.1a test interface and avoid the potential for board-level bus conflicts. essentially the trst signal provides the ability to prevent possible board-level bus contention during power-up due to the test logic having control of the pins. the device has no internal power-up reset circuit. the trst signal should be treated similar to the rsti signal for board design considerations concerning power-up conditions. negation of the trst signal requires certain precautions to achieve a predictable tap controller state. the tms signal is sampled on the rising edge of tck and sequences the tap controller. if tms is low and trst is negated simultaneously with the rising edge of tck, the resultant tap controller state is unpredictable but will be either test-logic-reset or run-test/idle. to avoid this uncertainty, either 1) the negation of trst can be synchronized with the falling edge of tck or 2) tms can remain high until after trst negation. alternatively, holding tms low for two or more tck periods following trst negation ensures that the tap controller is in the run-test/idle state. 6.5 disabling the ieee standard 1149.1a operation there are two considerations for non-ieee standard 1149.1a operation. first, tck does not include an internal pullup resistor and should not be left unconnected to preclude mid- level inputs. the second consideration is to ensure that the ieee standard 1149.1a test logic remains transparent to the system logic by providing the ability to force the test-logic- reset state. figure 6-7 illustrates disabling the ieee standard 1149.1a operation through connecting trst directly or through a resistor to ground or a suitable logic network. connecting trst to rsti while tck is held either high or low meets the two considerations. if a pulse asserts trst , the tap controller is forced into the test-logic-reset state and can remain in this state as long as a rising edge on the tck signal does not occur when tms is low. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
6- 14 m68040 user? manual motorola tdi tms trst tclk td0 no connection +5v 1k figure 6-7. circuit disabling ieee standard 1149.1a f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 6- 15 6.6 motorola m68040 bsdl description (version 2.2) revision list: 1. lock and locke controlled by io.1 vice io.0 (4d98d). 3. no other changes to version 2.1 bsdl. 2. instruction opcodes changed for sample, shutdown, and bypass. 3. new instructions drvctl.t, drvctl.s and private added. 4. new instructions drvctl.t and drvctl.s renamed to drvctl_t and drvctl_s for syntax compatibility. 5. register access specified for drvctl_t, drvctl_s, and private instructions. 6. no other changes to version 1.0 bsdl. package type: 18 x 18 pga this bsdl is for the newer mc68040 mask sets of e26a and after (roughly after the second half of 1992). it does not include the 0.8- m m mask sets d43b, d50d, and d98d. for mc68lc040 and mc68ec040, two pin names have changed. to make the necessary modifications, change all occurrences of dle to js0 and mdis to js1. entity mc68040 is generic(physical_pin_map:string := "pga_18x18"); port (tdi: in bit; tdo: out bit; tms: in bit; tck: in bit; trst: in bit; rsto: buffer bit; ipend: buffer bit; ciout: out bit; upa: out bit_vector(0 to 1); tt: inout bit_vector(0 to 1); a: inout bit_vector(0 to 31); d: inout bit_vector(0 to 31); locke: out bit; lock: out bit; r_w: inout bit; tln: out bit_vector(0 to 1); tm: out bit_vector(0 to 2); siz: inout bit_vector(0 to 1); mi: buffer bit; br: buffer bit; ts: inout bit; bb: inout bit; tip: out bit; pst: buffer bit_vector(0 to 3); ta: inout bit; tea: in bit; bg: in bit; sc: in bit_vector(0 to 1); tbi: in bit; avec: in bit; tci: in bit; f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
6- 16 m68040 user? manual motorola dle: in bit; pclk: in bit; bclk: in bit; ipl: in bit_vector(0 to 2); rsti: in bit; cdis: in bit; mdis: in bit; egnd: linkage bit_vector(1 to 23); evdd: linkage bit_vector(1 to 12); ignd: linkage bit_vector(1 to 12); ivdd: linkage bit_vector(1 to 7); cgnd: linkage bit_vector(1 to 2); cvdd: linkage bit_vector(1 to 6); pgnd: linkage bit_vector(1 to 3); pvdd: linkage bit_vector(1 to 2) ); use std_1149_1_1990.all; attribute pin_map of mc68040 : entity is physical_pin_map; ?8x18 pga pin map constant pga_18x18 : pin_map_string := "tdi: s3, " & "tdo: t2, " & "tms: s5, " & "tck: s4, " & "trst: t3, " & "rsto: r3, " & "ipend: s1, " & "ciout: r1, " & "upa: (q3, q1), " & "tt: (p3, p2), " & "a: (l18, k18, j17, j18, h18, g18, g16, f18, e18, f16, p1, n3, " & " n1, m1, l1, k1, k2, j1, h1, j2, g1, f1, e1, g3, " & " d1, f3, e2, c1, e3, b1, d3, a1), " & "d: (c3, b3, c4, a2, a3, a4, a5, a6, b7, a7, a8, a9, " & " a10, a11, a12, a13, b11, a14, b12, a15, a16, a17, b16, c15, " & " a18, c16, b18, d16, c18, e16, e17, d18), " & "locke: r18, " & "lock: s18, " & "r_w: n16, " & "tln: (q18, p18), " & "tm: (n18, m18, k17), " & "siz: (p17, p16), " & "mi: q16, " & "br: t18, " & "ts: r16, " & "bb: t17, " & "tip: r15, " & "pst: (t15, s14, r14, t16), " & "ta: t14, " & "tea: s13, " & "bg: t13, " & "sc: (t12, s12), " & "tbi: s11, " & "avec: t11, " & f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 6- 17 "tci: t10, " & "dle: t9, " & "pclk: r9, " & "bclk: r7, " & "ipl: (t8, t7, t6), " & "rsti: s7, " & "cdis: t5, " & "mdis: s6, " & "egnd: (s2, q2, n2, l2, h2, f2, d2, b2, b4, b6, b8, b10, " & " b13, b15, b17, d17, f17, h17, l17, n17, q17, s17, s15), " & "evdd: (r2, m2, g2, c2, b5, b9, b14, c17, g17, m17, r17, s16), " & "ignd: (t4, r4, l3, k3, c7, c9, c11, k16, m16, r13, r11, s10), " & "ivdd: (r5, m3, c8, c10, c12, l16, r12), " & "cgnd: (c6, c13), " & "cvdd: (j3, h3, c5, c14, h16, j16), " & "pgnd: (s9, r10, r6), " & "pvdd: (s8, r8) " ; ?ther pin maps here when documented attribute tap_scan_in of tdi:signal is true; attribute tap_scan_out of tdo:signal is true; attribute tap_scan_mode of tms:signal is true; attribute tap_scan_clock of tck:signal is (10.0e6, both); attribute tap_scan_reset of trst:signal is true; attribute instruction_length of mc68040:entity is 3; attribute instruction_opcode of mc68040:entity is "extest (000), " & "hi_z (001), " & "sample (010), " & "drvctl.t (011), " & "shutdown (100), " & "private (101), " & "drvctl.s (110), " & "bypass (111) " ; attribute instruction_capture of mc68040:entity is "001"; attribute instruction_disable of mc68040:entity is "hi_z"; attribute register_access of mc68040:entity is "bypass (shutdown, hi_z, private), " & "boundary (drvctl_t, drvctl_s) " ; attribute boundary_cells of mc68040:entity is "bc_2, bc_4 " ; attribute boundary_length of mc68040:entity is 184; attribute boundary_register of mc68040:entity is f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
6- 18 m68040 user? manual motorola num cell port function safe ccell dsval rslt "0 (bc_2, rsto, output2, x), " & "1 (bc_2, ipend, output2, x), " & "2 (bc_2, ciout, output3, x, 156, 0, z), " & ?56 = io.0 "3 (bc_2, upa(0), output3, x, 156, 0, z), " & "4 (bc_2, upa(1), output3, x, 156, 0, z), " & "5 (bc_2, tt(0), output3, x, 156, 0, z), " & "6 (bc_4, tt(0), input, x), " & "7 (bc_2, tt(1), output3, x, 156, 0, z), " & "8 (bc_4, tt(1), input, x), " & "9 (bc_2, a(10), output3, x, 150, 0, z), " & ?50 = io.ab "10 (bc_4, a(10), input, x), " & "11 (bc_2, a(11), output3, x, 150, 0, z), " & "12 (bc_4, a(11), input, x), " & "13 (bc_2, a(12), output3, x, 150, 0, z), " & "14 (bc_4, a(12), input, x), " & "15 (bc_2, a(13), output3, x, 150, 0, z), " & "16 (bc_4, a(13), input, x), " & "17 (bc_2, a(14), output3, x, 150, 0, z), " & "18 (bc_4, a(14), input, x), " & "19 (bc_2, a(15), output3, x, 150, 0, z), " & "20 (bc_4, a(15), input, x), " & "21 (bc_2, a(16), output3, x, 150, 0, z), " & "22 (bc_4, a(16), input, x), " & "23 (bc_2, a(17), output3, x, 150, 0, z), " & "24 (bc_4, a(17), input, x), " & "25 (bc_2, a(18), output3, x, 150, 0, z), " & "26 (bc_4, a(18), input, x), " & "27 (bc_2, a(19), output3, x, 150, 0, z), " & "28 (bc_4, a(19), input, x), " & "29 (bc_2, a(20), output3, x, 150, 0, z), " & "30 (bc_4, a(20), input, x), " & "31 (bc_2, a(21), output3, x, 150, 0, z), " & "32 (bc_4, a(21), input, x), " & "33 (bc_2, a(22), output3, x, 150, 0, z), " & "34 (bc_4, a(22), input, x), " & "35 (bc_2, a(23), output3, x, 150, 0, z), " & "36 (bc_4, a(23), input, x), " & "37 (bc_2, a(24), output3, x, 150, 0, z), " & "38 (bc_4, a(24), input, x), " & "39 (bc_2, a(25), output3, x, 150, 0, z), " & "40 (bc_4, a(25), input, x), " & "41 (bc_2, a(26), output3, x, 150, 0, z), " & "42 (bc_4, a(26), input, x), " & "43 (bc_2, a(27), output3, x, 150, 0, z), " & "44 (bc_4, a(27), input, x), " & "45 (bc_2, a(28), output3, x, 150, 0, z), " & "46 (bc_4, a(28), input, x), " & "47 (bc_2, a(29), output3, x, 150, 0, z), " & "48 (bc_4, a(29), input, x), " & "49 (bc_2, a(30), output3, x, 150, 0, z), " & "50 (bc_4, a(30), input, x), " & "51 (bc_2, a(31), output3, x, 150, 0, z), " & "52 (bc_4, a(31), input, x), " & "53 (bc_2, d(0), output3, x, 151, 0, z), " & 151 = io.db "54 (bc_2, d(1), output3, x, 151, 0, z), " & "55 (bc_2, d(2), output3, x, 151, 0, z), " & "56 (bc_2, d(3), output3, x, 151, 0, z), " & f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 6- 19 num cell port function safe ccell dsval rslt "57 (bc_2, d(4), output3, x, 151, 0, z), " & "58 (bc_2, d(5), output3, x, 151, 0, z), " & "59 (bc_2, d(6), output3, x, 151, 0, z), " & "60 (bc_2, d(7), output3, x, 151, 0, z), " & "61 (bc_2, d(8), output3, x, 151, 0, z), " & "62 (bc_2, d(9), output3, x, 151, 0, z), " & "63 (bc_2, d(10), output3, x, 151, 0, z), " & "64 (bc_2, d(11), output3, x, 151, 0, z), " & "65 (bc_2, d(12), output3, x, 151, 0, z), " & "66 (bc_2, d(13), output3, x, 151, 0, z), " & "67 (bc_2, d(14), output3, x, 151, 0, z), " & "68 (bc_2, d(15), output3, x, 151, 0, z), " & "69 (bc_2, d(16), output3, x, 151, 0, z), " & "70 (bc_2, d(17), output3, x, 151, 0, z), " & "71 (bc_2, d(18), output3, x, 151, 0, z), " & "72 (bc_2, d(19), output3, x, 151, 0, z), " & "73 (bc_2, d(20), output3, x, 151, 0, z), " & "74 (bc_2, d(21), output3, x, 151, 0, z), " & "75 (bc_2, d(22), output3, x, 151, 0, z), " & "76 (bc_2, d(23), output3, x, 151, 0, z), " & "77 (bc_2, d(24), output3, x, 151, 0, z), " & "78 (bc_2, d(25), output3, x, 151, 0, z), " & "79 (bc_2, d(26), output3, x, 151, 0, z), " & "80 (bc_2, d(27), output3, x, 151, 0, z), " & "81 (bc_2, d(28), output3, x, 151, 0, z), " & "82 (bc_2, d(29), output3, x, 151, 0, z), " & "83 (bc_2, d(30), output3, x, 151, 0, z), " & "84 (bc_2, d(31), output3, x, 151, 0, z), " & "85 (bc_4, d(0), input, x), " & "86 (bc_4, d(1), input, x), " & "87 (bc_4, d(2), input, x), " & "88 (bc_4, d(3), input, x), " & "89 (bc_4, d(4), input, x), " & "90 (bc_4, d(5), input, x), " & "91 (bc_4, d(6), input, x), " & "92 (bc_4, d(7), input, x), " & "93 (bc_4, d(8), input, x), " & "94 (bc_4, d(9), input, x), " & "95 (bc_4, d(10), input, x), " & "96 (bc_4, d(11), input, x), " & "97 (bc_4, d(12), input, x), " & "98 (bc_4, d(13), input, x), " & "99 (bc_4, d(14), input, x), " & "100 (bc_4, d(15), input, x), " & "101 (bc_4, d(16), input, x), " & "102 (bc_4, d(17), input, x), " & "103 (bc_4, d(18), input, x), " & "104 (bc_4, d(19), input, x), " & "105 (bc_4, d(20), input, x), " & "106 (bc_4, d(21), input, x), " & "107 (bc_4, d(22), input, x), " & "108 (bc_4, d(23), input, x), " & "109 (bc_4, d(24), input, x), " & "110 (bc_4, d(25), input, x), " & "111 (bc_4, d(26), input, x), " & "112 (bc_4, d(27), input, x), " & "113 (bc_4, d(28), input, x), " & f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
6- 20 m68040 user? manual motorola num cell port function safe ccell dsval rslt "114 (bc_4, d(29), input, x), " & "115 (bc_4, d(30), input, x), " & "116 (bc_4, d(31), input, x), " & "117 (bc_2, a(9), output3, x, 150, 0, z), " & ?50 = io.ab "118 (bc_4, a(9), input, x), " & "119 (bc_2, a(8), output3, x, 150, 0, z), " & "120 (bc_4, a(8), input, x), " & "121 (bc_2, a(7), output3, x, 150, 0, z), " & "122 (bc_4, a(7), input, x), " & "123 (bc_2, a(6), output3, x, 150, 0, z), " & "124 (bc_4, a(6), input, x), " & "125 (bc_2, a(5), output3, x, 150, 0, z), " & "126 (bc_4, a(5), input, x), " & "127 (bc_2, a(4), output3, x, 150, 0, z), " & "128 (bc_4, a(4), input, x), " & "129 (bc_2, a(3), output3, x, 150, 0, z), " & "130 (bc_4, a(3), input, x), " & "131 (bc_2, a(2), output3, x, 150, 0, z), " & "132 (bc_4, a(2), input, x), " & "133 (bc_2, a(1), output3, x, 150, 0, z), " & "134 (bc_4, a(1), input, x), " & "135 (bc_2, a(0), output3, x, 150, 0, z), " & "136 (bc_4, a(0), input, x), " & "137 (bc_2, tm(2), output3, x, 156, 0, z), " & ?56 = io.0 "138 (bc_2, tm(1), output3, x, 156, 0, z), " & "139 (bc_2, tm(0), output3, x, 156, 0, z), " & "140 (bc_2, tln(1), output3, x, 156, 0, z), " & "141 (bc_2, tln(0), output3, x, 156, 0, z), " & "142 (bc_2, siz(0), output3, x, 156, 0, z), " & "143 (bc_4, siz(0), input, x), " & "144 (bc_2, r_w, output3, x, 156, 0, z), " & "145 (bc_4, r_w, input, x), " & "146 (bc_2, lo cke, output3, x, 156, 0, z), " & "147 (bc_2, siz(1), output3, x, 156, 0, z), " & "148 (bc_4, siz(1), input, x), " & "149 (bc_2, lock, output3, x, 156, 0, z), " & "150 (bc_2, *, controlr, 0), " & ?io.ab "151 (bc_2, *, controlr, 0), " & ?io.db "152 (bc_2, mi, output2, x), " & "153 (bc_2, br, output2, x), " & "154 (bc_2, *, controlr, 0), " & io.2 "155 (bc_2, *, controlr, 0), " & io.1 "156 (bc_2, *, controlr, 0), " & io.0 "157 (bc_2, ts, output3, x, 156, 0, z), " & ?156 = io.0 "158 (bc_4, ts, input, x), " & "159 (bc_2, bb, output3, x, 155, 0, z), " & ?155 = io.1 "160 (bc_4, bb, input, x), " & "161 (bc_2, tip, output3, x, 155, 0, z), " & ?155 = io.1 "162 (bc_2, pst(3), output2, x), " & "163 (bc_2, pst(2), output2, x), " & "164 (bc_2, pst(1), output2, x), " & "165 (bc_2, pst(0), output2, x), " & "166 (bc_2, ta, output3, x, 154, 0, z), " & ?154 = io.2 "167 (bc_4, ta, input, x), " & "168 (bc_4, tea, input, x), " & "169 (bc_4, bg, input, x), " & "170 (bc_4, sc(1), input, x), " & f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 6- 21 num cell port function safe ccell dsval rslt "171 (bc_4, sc(0), input, x), " & "172 (bc_4, tbi, input, x), " & "173 (bc_4, avec, input, x), " & "174 (bc_4, tci, input, x), " & "175 (bc_4, dle, input, x), " & "176 (bc_4, pclk, input, x), " & "177 (bc_4, bclk, input, x), " & "178 (bc_4, ipl(0), input, x), " & "179 (bc_4, ipl(1), input, x), " & "180 (bc_4, ipl(2), input, x), " & "181 (bc_4, rsti, input, x), " & "182 (bc_4, cdis, input, x), " & "183 (bc_4, mdis, input, x) " ; attribute design_warning of mc68040: entity is "a non-standard clocking protocol on bclk and pclk must be " & "observed when entering boundary scan test mode. " ; end mc68040 ; 6.7 mc68040, mc68lc040, mc68ec040 jtag electrical characteristics the following paragraphs provide information on jtag electrical and timing specifications. this section is subject to change. for the most recent specifications, contact a motorola sales office or complete the registration card at the beginning of this manual. jtag dc electrical specifications characteristic symbol min max unit input high voltage v ih 2v cc v input low voltage v il gnd 0.8 v undershoot 0.8 v tck input leakage current @ 0.5?.4 v i in 20 20 m a tdo hi-z (off-state) leakage current @ 0.5?.4 v i tst 20 20 m a signal low input current, v il = 0.8 v tms, tdi, trst i l ?.1 ?.18 ma signal high input current, v ih = 2.0 v tms, tdi, trst i h ?.94 ?.16 ma tdo output high voltage v oh 2.4? tdo output low voltage v ol 0.5 v capacitance*, v in = 0 v, f = 1 mhz c in ?5pf *capacitance is periodically sampled rather than 100% tested. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
6- 22 m68040 user? manual motorola jtag timing specifications (all operating frequencies) num characteristic min max unit tck frequency of operation 0 10 mhz 1 tck cycle time 100 ns 2 tck clock pulse width measured at 1.5 v 40 ns 3 tck rise and fall times 0 10 ns 4 trst setup time to tck falling edge 40 ns 5 trst assert time 100 ns 6 boundary scan input data setup time 50 ns 7 boundary scan input data hold time 50 ns 8 tck to output data valid 0 50 ns 9 tck to output high impedance 0 50 ns 10 tms, tdi data setup time 20 ns 11 tms, tdi data hold time 5 ns 12 tck to tdo data valid 0 20 ns 13 tck to tdo high impedance 0 20 ns v il v ih 3 3 vm vm 2 2 1 figure 6-8. clock input timing diagram v ih 4 trst 5 tck figure 6-9. trst timing diagram f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 6- 23 tck data inputs data outputs data outputs data outputs 8 9 8 6 v ih v il input data valid output data valid output data valid 7 figure 6-10. boundary scan timing diagram tclk tdi, tms tdo tdo tdo 12 13 12 10 v ih v il input data valid output data valid output data valid 11 figure 6-11. test access port timing diagram f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 1 section 7 bus operation the m68040 bus interface supports synchronous data transfers between the processor and other devices in the system. this section provides a functional description of the bus, the signals that control the bus, and the bus cycles provided for data transfer operations. operation of the bus is defined for transfers initiated by the processor as a bus master and for transfers initiated by an alternate bus master, which the processor snoops as a slave device. descriptions of the error and halt conditions, bus arbitration, and the reset operation are also included. for timing specifications, refer to section 11 mc68040 electrical and thermal characteristics. note for the mc68040v, mc68lc040, and mc68ec040 ignore all references to floating-point. for the mc68ec040 and mc68ec040v ignore all references to the memory management unit (mmu). special modes of operation do not apply to these devices. refer to appendix a mc68lc040 and appendix b mc68ec040 for details. 7.1 bus characteristics the m68040 uses the address bus (a31?0) to specify the address for a data transfer and the data bus (d31?0) to transfer the data. control signals indicate the beginning and type of a bus cycle as well as the address space and size of the transfer. the selected device then controls the length of the cycle by terminating it using the control signals. the m68040 uses two clocks to generate timing: a processor clock (pclk) and a bus clock (bclk). the pclk signal is twice the frequency of the bclk signal and is internally phase-locked to bclk. pclk is also distributed throughout the device to generate additional timing for additional edges for internal logic blocks and has no bearing on bus timing. the use of dual clock inputs allows the bus interface to operate at half the speed of the internal logic of the processor, requiring less stringent memory interface requirements. since the rising edge of bclk is used as the reference point for the phase-locked loop (pll), all timing specifications are referenced to this edge. figure 7-1 illustrates the general relationship between the two clock signals and most input and output signals. the rising edge of the internally phase-locked pclk is aligned with the rising edge of bclk, and the two pclk cycles corresponding to each bclk cycle are divided into four states, t1?4. most outputs change during state t4, whether transitioning between a driven and high-impedance state or switching between assert and f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 2 m68040 user? manual motorola negate logic levels. the exceptions to this rule are the tip , ta , and bb signals that transition between logic levels during t4 but transition from a driven state to a high- impedance state during t1. the input setup time (t su ), input hold time (t hi ), output hold time (t ho ), and delay time (t d ) illustrated in figure 7-1 are described in the ac electrical timing specifications in section 11 mc68040 electrical and thermal characteristics . outputs inputs bclk t1 t2 t3 t4 t1 internally
phase-locked
pclk t ho' t d t d' t ho t su t hi = required input setup time relative to bclk rising edge. t su = required input hold time relative to bclk rising edge. t hi = output hold time relative to bclk rising edge. t ho t ho' = output hold time relative to bclk rising edge; = ?/2 pclk. t ho' t h = propagation delay of signal relative to blk rising edge. t d = propagation delay of signal relative to pclk falling edge. t d' ; = ?/2 pclk t d' t d except for tip, ta, bb when used as outputs. notes: 1. 2. 3. 4. 5. 6. figure 7-1. signal relationships to clocks inputs to the m68040 (other than the ipl2eipl0 and rsti signals) are synchronously sampled and must be stable during the sample window defined by t su , t hi , and t ho (see figure 7-1) to guarantee proper operation. the asynchronous ipl? and rsti signals are also sampled on the rising edge of bclk, but are internally synchronized to resolve the input to a valid level before using it. since the timing specifications for the m68040 are referenced to the rising edge of bclk, they are valid only for the specified operating frequency and must be scaled for lower operating frequencies. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 3 7.2 data transfer mechanism figure 7-2 illustrates how the bus designates operands for transfers on a byte boundary system. the integer unit handles floating-point operands as a sequence of related long- word operands. these designations are used in the figures and descriptions that follow. 31 0 long-word operand word operand byte operand 24 23 16 15 8 7 most significant byte least significant byte most significant byte least significant byte byte 3 byte 2 byte 1 byte 0 figure 7-2. internal operand representation figure 7-3 illustrates general multiplexing between an internal register and the external bus. the internal register connects to the external data bus through the internal data bus and multiplexer. the data multiplexer establishes the necessary connections for different combinations of address and data sizes. unlike the mc68020 and mc68030 processors, the m68040 does not support dynamic bus sizing and expects the referenced device to accept the requested access width. the mc68150 dynamic bus sizer is designed to allow the 32-bit m68040, mc68ec040, mc68lc040 bus to communicate bidirectionally with 32-, 16-, or 8-bit peripherals and memories. it dynamically recognizes the size of the selected peripheral or memory device and then reads or writes the appropriate data from that location. refer to mc68150/d, mc68150 dynamic bus sizer , for information on this device. blocks of memory that must be contiguous, such as for code storage or program stacks, must be 32 bits wide. byte- and word-sized i/o ports that return an interrupt vector during interrupt acknowledge cycles must be mapped into the low-order 8 or 16 bits, respectively, of the data bus. the multiplexer takes the four bytes of the 32-bit bus transfer and routes them to their required positions. for example, byte 0 would normally be routed to d31?24, but it can also be routed to any other byte position supporting a misaligned data transfer. the same is true for any of the other operand bytes. the transfer size (siz0 and siz1) and byte offset (a1 and a0) signals determine the positioning of the bytes (see table 7-1). the size indicated on the sizx signals corresponds to the size of the operand transfer for the entire bus cycle. during an operand transfer, a31?2 indicate the long-word base address for the first byte of the operand to be accessed; a1 and a0 indicate the byte offset from the base. for a burst-inhibited line transfer, a1 and a0 for each of the four accesses (the burst-inhibited line transfer and three long-word transfers) are copied from the lowest two bits of the access address used to initiate the line transfer. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 4 m68040 user? manual motorola register address
$xxxxxxx0 external
data bus 31 0 24 23 16 15 8 7 byte 3 byte 2 byte 1 byte 0 routing multiplexer 31 0 24 23 16 15 8 7 external bus internal to
the mc68040 byte 3 byte 2 byte 1 byte 0 d31?24 d23?16 d15?8 d7?0 figure 7-3. data multiplexing table 7-1 lists the combinations of the sizx, a1, and a0 signals, collectively called byte enable signals, that are used for each of the four sections of the data bus. in the table, byten indicates the data bus section that is active, the portion of the requested operand that is read or written during that bus transfer. for line transfers, all bytes are valid as listed and can correspond to portions of the requested operand or to data required to fill the remainder of the cache line. the bytes labeled with a dash are not required; they are ignored on read transfers and driven with undefined data on write transfers. not selecting these bytes prevents incorrect accesses in sensitive areas such as i/o devices. figure 7-4 illustrates a logic diagram for one method for generating byte enable signals from the sizx, a1, and a0 and the associated pal equation. these byte enable signals can be combined with the address decode logic. table 7-1. data bus requirements for read and write cycles transfer signal encodings active data bus sections size siz1 siz0 a1 a0 d31?24 d23?16 d15?8 d7?0 byte 0 0 0 0 1 1 1 1 0 0 1 1 0 1 0 1 byten byten byten byten word 1 1 0 0 0 1 0 0 byten byten byten byten long word 0 0 x x byten byte n byten byten line 1 1 x x byten byten byten byten f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 5 a0 a1 siz0 siz1 upper upper data select
d31?24 upper middle data select
d23?16 lower middle data select
d15?8 lower lower data select
d7?0 pal16l8
u1
mc68040 byte data select generation.
motorola worldwide marketing training organization
a0 a1 siz0 siz1 nc nc nc nc nc gnd nc uud umd lmd lld
nc nc nc nc vcc /uud = /a0 * /a1
+ /siz1 * /siz0
+ siz1 * siz0 ; directly addressed, any size
; enable every byte for long word size
; enable every byte for line size
; directly addressed, any size
; word aligned, size is word or line
; enable every byte for long word size
; enable every byte for line size
; directly addressed, any size
; enable every byte for long word size
; enable every byte for line size
; directly addressed, any size
; word aligned, word or line size
; enable every byte for long word size
; enable every byte for line size /umd = a0 * /a1
+ /a1 * /siz1
+ siz1 * siz0
+ /siz1 * /siz0 /lmd = /a0 * /a1
+ /siz1 * /siz0
+ siz1 * siz0 /lld = a0 * /a1
+ /a1 * /siz1
+ siz1 * siz0
+ /siz1 * /siz0 figure 7-4. byte enable signal generation and pal equation a brief summary of the bus signal encodings for each access type is listed in table 7-2. additional information on the encodings for the m68040 signals can be found in section 5 signal description. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 6 m68040 user? manual motorola table 7-2. summary of access types versus bus signal encodings bus signal data cache push access normal data/code access table search access move16 access alternate access interrupt acknowledge breakpoint acknowledge a31?0 access address access address entry address access address access address $ffffffff $00000000 upa1, upa0 $0 mmu source 1 $0 mmu source 1 $0 $0 $0 siz1, siz0 l/line b/w/l/line long word line b/w/l byte byte tt1, tt0 $0 $0 $0 $1 $2 $3 $3 tm4?m2 $0 $1,2,5, or 6 $3 or 4 $1 or 5 function code int. level $1? $0 tln1, tln0 cache set entry cache set entry 2 undefined undefined undefined undefined undefined r/ w write read/write read/write read/write read/write read read lock locke negated asserted/ negated 3 asserted/ negated 3 negated negated negated negated ciout negated mmu source 1 negated mmu source 1 asserted negated negated notes 1. the upa1, upa0, and ciout signals are determined by the u1, u0 data and cm bit fields, respectively, corresponding to the access address. 2. the tlnx signals are defined only for normal push accesses and normal data line read accesses. 3. the lock signal is asserted during tas, cas, and cas2 operand accesses and for some table search update sequences. locke is asserted for the last transfer of each locked sequence of transfers. 4. refer to section 5 signal description for definitions of the tmx signal encodings for normal, move16, and alternate accesses. 7.3 misaligned operands all m68040 data formats can be located in memory on any byte boundary. a byte operand is properly aligned at any address; a word operand is misaligned at an odd address; and a long word is misaligned at an address that is not evenly divisible by 4. however, since operands can reside at any byte boundary, they can be misaligned. although the m68040 does not enforce any alignment restrictions for data operands (including pc relative data addressing), some performance degradation occurs when additional bus cycles are required for long-word or word operands that are misaligned. for maximum performance, data items should be aligned on their natural boundaries. all instruction words and extension words must reside on word boundaries. attempting to prefetch an instruction word at an odd address causes an address error exception. refer to section 8 exception processing for details on address error exceptions. the m68040 data memory unit converts misaligned operand accesses that are noncachable to a sequence of aligned accesses. these aligned accesses are then sent to the bus controller for completion, always resulting in aligned bus transfers. misaligned operand accesses that miss in the data cache are cachable and are not aligned before line filling. refer to section 4 instruction and data caches for details on line fill and the data cache. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 7 figure 7-5 illustrates the transfer of a long-word operand from an odd address requiring more than one bus cycle. for the first transfer or bus cycle, the sizx signals specify a byte transfer, and the byte offset is $1. the slave device supplies the byte and acknowledges the data transfer. when the processor starts the second cycle, the sizx signals specify a word transfer with a byte offset of $2. the next two bytes are transferred during this cycle. the processor then initiates the third cycle, with the sizex signals indicating a byte transfer. the byte offset is now $0; the port supplies the final byte and the operation is complete. this example is similar to the one illustrated in figure 7-6 except that the operand is word sized and the transfer requires only two bus cycles. figure 7-7 illustrates a functional timing diagram for a misaligned long-word read transfer. data bus 31 0 byte 3 byte 2 byte 1 byte 0 x memory 31 0 xxx byte 3 byte 2 byte 1 byte 0 xxx xxx xxx transfer 1 transfer 2 transfer 3 24 23 16 15 8 7 24 23 16 15 8 7 figure 7-5. example of a misaligned long-word transfer data bus 31 0 byte 1 byte 0 byte 1 memory 31 0 xxx xxx xxx byte 1 byte 0 xxx xxx xxx transfer 1 transfer 2 24 23 16 15 8 7 24 23 16 15 8 7 figure 7-6. example of a misaligned word transfer f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 8 m68040 user? manual motorola a31?2 bclk byte siz1 tt1, tt0 tm2?m0 d31?24 upa1, upa0 ciout ts tip ta a1 a0 siz0 word d23?16 d15?8 d7?0 byte 0 byte 1 byte 2 byte 3 byte byte
read word
read byte
read r/w c1 c2 c1 c2 c1 c2 figure 7-7. misaligned long-word read transfer timing f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 9 the combination of operand size and alignment determines the number of bus cycles required to perform a particular memory access. table 7-3 lists the number of bus cycles required for different operand sizes with all possible alignment conditions for read and write cycles. the table confirms that alignment significantly affects bus cycle throughput for noncachable accesses. for example, in figure 7-5 the misaligned long-word operand took three bus cycles because the byte offset = $1. if the byte offset = $0, then it would have taken one bus cycle. the m68040 system designer and programmer should account for these effects, particularly in time-critical applications. table 7-3. memory alignment influence on noncachable and write-through bus cycles number of bus cycles transfer size $0 * $1 * $2 * $3 * instruction 1 n/a n/a n/a byte operand 1 1 1 1 word operand 1 2 1 2 long-word operand 1 3 2 3 *where the byte offset (a1 and a0) equals this encoding. the processor always prefetches instructions by reading a long word from a half-line address (a2?0 = $0), regardless of alignment. when the required instruction begins at the second long word, the processor attempts to fetch the entire half-line (two long words) although the second long word contains the required instruction. 7.4 processor data transfers the transfer of data between the processor and other devices involves the address bus, data bus, and control signals. the address and data buses are normally parallel, nonmultiplexed buses, supporting byte, word, long-word, and line (16-byte) bus cycles. line transfers are normally performed using an efficient burst transfer, which provides an initial address and time-multiplexes the data bus to transfer four long words of information to or from the slave device. slave devices that do not support bursting can burst-inhibit the first long word of a line transfer, forcing the bus master to complete the access using three additional long-word bus cycles. all bus input and output signals are synchronous to the rising edge of the bclk signal. the m68040 moves data on the bus by issuing control signals and using a handshake protocol to ensure correct data movement. the following paragraphs describe the bus cycles for byte, word, long-word, and line read, write, and read-modify-write transfers. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 10 m68040 user? manual motorola 7.4.1 byte, word, and long-word read transfers during a read transfer, the processor receives data from a memory or peripheral device. since the data read for a byte, word, or long-word access is not placed in either of the internal caches by definition, the processor ignores the level on the transfer cache inhibit ( tci ) signal when latching the data. the bus controller performs byte, word, and long-word read transfers for the following cases: accesses to a disabled cache. accesses to a memory page that is specified noncachable. accesses that are implicitly noncachable (read-modify-write accesses and accesses to an alternate logical address space via the moves instruction). accesses that do not allocate in the data cache on a read miss (table searches, exception vector fetches, and exception stack deallocation for an rte instruction). the first transfer of a line read is terminated with transfer burst inhibit ( tbi ), forcing completion of the line access using three additional long-word read transfers. figure 7-8 is a flowchart for byte, word, and long-word read transfers. bus operations are similar for each case and vary only with the size indicated and the portion of the data bus used for the transfer. figure 7-9 is a functional timing diagram for byte, word, and long- word read transfers. address device 1) latch data acquire data start next cycle processor external device present data terminate cycle 1) decode address
2) place data on appropriate bytes of
d31?0 based on sizex, a0, and a1
3) assert ta
1) remove data from d31?0
2) negate ta 1) set r/w to read
2) drive address on a31?0
3) drive user page attributes on upa1, upa0
4) drive size on siz1, siz0 (byte, word,
or long word)
5) drive transfer type on tt1, tt0
6) drive transfer modifier on tm2?m0
7) ciout becomes valid
8) assert ts for one clock
9) assert tip figure 7-8. byte, word, and long-word read transfer flowchart f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 11 c1 c2 a31?2 bclk byte siz1 tt1, tt0 tm2?m0 d31?24 upa1, upa0 ciout ts tip ta r/w a1 a0 siz0 long d23?16 d15?8 d7?0 byte read word read
with wait long-word
read c1 cw c1 c2 c2 word figure 7-9. byte, word, and long-wordread transfer timing f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 12 m68040 user? manual motorola clock 1 (c1) the read cycle starts in c1. during the first half of c1, the processor places valid values on the address bus and transfer attributes. for user and supervisor mode accesses, which the corresponding memory unit translates, the user-programmable attribute signals (upax) are driven with the values from the matching user bits (u1 and u0). the transfer type (ttx) and transfer modifier (tmx) signals identify the specific access type. the read/write (r/ w ) signal is driven high for a read cycle. cache inhibit out ( ciout ) is asserted since the access is identified as noncachable. refer to section 3 memory management unit (except mc68ec040 and mc68ec040v) for information on the m68040 and mc68lc040 memory units and appendix b mc68ec040 for information on the mc68ec040 memory unit. the processor asserts transfer start ( ts ) during c1 to indicate the beginning of a bus cycle. if not already asserted from a previous bus cycle, the transfer in progress ( tip ) signal is also asserted at this time to indicate that a bus cycle is active. clock 2 (c2) during the first half of the clock after c1, the processor negates ts . the selected peripheral device uses r/ w , siz1, siz0, a1, and a0 to place its information on the data bus. with the exception of the r/ w signal, these signals also select any or all of the operand bytes (d31?24, d23?16, d15?8, and d7?0). if the first clock after c1 is not a wait state (cw), then the selected peripheral device asserts the transfer acknowledge ( ta ) signal. at the end of the first clock cycle after c1, the processor samples the level of ta and latches the current value on the data bus; the bus cycle terminates, and the data is passed to the processor? appropriate memory unit if ta is asserted. if ta is not recognized asserted at the end of the clock cycle, the processor ignores the data and inserts a wait state instead of terminating the transfer. the processor continues to sample ta on successive rising edges of bclk until ta is recognized asserted. the data is then passed to the processor? appropriate memory unit. when the processor recognizes ta at the end of a clock and terminates the bus cycle, tip remains asserted if the processor is ready to begin another bus cycle. otherwise, the processor negates tip during the first half of the next clock. 7.4.2 line read transfer the processor uses line read transfers to access a 16-byte operand for a move16 instruction and to support cache line filling. a line read accesses a block of four long words, aligned to a 16-byte memory boundary, by supplying a starting address that points to one of the long words and requiring the memory device to sequentially drive each long word on the data bus. the selected device must internally increment a3 and a2 of the supplied address for each transfer, causing the address to wrap around at the end of the block. the address and transfer attributes supplied by the processor remain stable during the transfers, and the selected device terminates each transfer by driving the long word on f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 13 the data bus and asserting ta . a line transfer performed in this manner with a single address is referred to as a line burst transfer. the m68040 also supports burst-inhibited line transfers for memory devices that are unable to support bursting. for this type of bus cycle, the selected device supplies the first long word pointed to by the processor address and asserts transfer burst inhibit ( tbi ) with ta for the first transfer of the line access. the processor responds by terminating the line burst transfer and accessing the remainder of the line, using three long-word read bus cycles. although the selected device can then treat the line transfer as four, independent, long-word bus cycles, the bus controller still handles the four transfers as a single line transfer and does not allow other unrelated processor accesses or bus arbitration to intervene between the transfers. tbi is ignored after the first long-word transfer. line reads to support cache line filling can be cache inhibited by asserting transfer cache inhibit ( tci ) with ta for the first long-word transfer of the line. the assertion of tci does not affect completion of the line transfer, but the bus controller latches and passes it to the memory controller for use. tci is ignored after the first long-word transfer of a line burst transfer and during the three long-word bus cycles for a burst-inhibited line transfer. the address placed on the address bus by the processor for line transfers does not necessarily point to the most significant byte of each long word because for a line read, a1 and a0 are copied from the original operand address supplied to the memory unit by the integer unit. these two bits are also unchanged for the three long-word bus cycles for a burst-inhibited line transfer. the selected device should ignore a1 and a0 for long-word and line read transfers. the address of an instruction fetch will always be aligned to a half-line boundary ($xxxxxxx0 or $xxxxxxx8); therefore, compilers should attempt to locate branch targets on half-line boundaries to minimize branch stalls. for example, if the target of a branch is a two-word instruction located at $1000000c, the following burst sequence will occur upon a cache miss: $10000008, $1000000c, $10000000, then $10000004. the internal pipeline of the m68040 stalls until the second access of the burst (the address of the instruction to be executed) has completed. figures 7-10 and 7-11 illustrate a flowchart and functional timing diagram for a line read bus transfer. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 14 m68040 user? manual motorola until four long words
transferred 1) negate tip (if required) when four long words
transferred address device end of burst start next cycle 1) set r/w to read
2) drive address on a31?0
3) drive user page attributes on upa1, upa0
4) drive size on siz1, siz0 (line)
5) drive transfer type on tt1, tt0
6) drive transfer modifier on tm2?m0
7) ciout becomes valid
8) assert ts for one clock
9) assert tip processor external device present data terminate cycle 1) decode address
2) place data on d31?0
3) assert ta 1) remove data from d31?0
2) negate ta (if necessary)
3) increment address bits a3, a2 (if necessary) 1) latch data
2) sample tbi and tci (for first transfer) acquire data figure 7-10. line read transfer flowchart f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 15 a31?4 bclk siz1, siz0 tt1, tt0 tm2?m0 d31?0 upa1, upa0 ciout ts tip ta r/w a3 a2?0 note: the selected device increments the value of a3 and a2. 10 11 00 01 c1 c2 c3 c4 c5 tci a3, a2 = figure 7-11. line read transfer timing clock 1 (c1) the line read cycle starts in c1. during the first half of c1, the processor places valid values on the address bus and transfer attributes. for user and supervisor mode accesses that are translated by the corresponding memory unit, the upax signals are driven with the values from the matching u1 and u0 bits. the ttx and tmx signals identify the specific access type. the r/ w signal is driven high for a read cycle, and the size signals (sizx) indicate line size. ciout is asserted for a move16 operand read if the access is identified as noncachable. refer to section 3 memory management unit f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 16 m68040 user? manual motorola (except mc68ec040 and mc68ec040v) for information on the m68040 and mc68lc040 memory units and appendix b mc68ec040 for information on the mc68ec040 memory unit. the processor asserts ts during c1 to indicate the beginning of a bus cycle. if not already asserted from a previous bus cycle, tip is also asserted at this time to indicate that a bus cycle is active. clock 2 (c2) during the first half of the first clock after c1, the processor negates ts . the selected device uses r/ w , siz1, and siz0 to place the data on the data bus. (the first transfer must supply the long word at the corresponding long-word boundary.) concurrently, the selected device asserts ta and either negates or asserts tbi to indicate it can or cannot support a burst transfer. at the end of the first clock cycle after c1, the processor samples the level of ta , tbi , and tci and latches the current value on the data bus. if ta is asserted, the transfer terminates and the data is passed to the appropriate memory unit. if ta is not recognized asserted, the processor ignores the data and inserts wait states instead of terminating the transfer. the processor continues to sample ta , tbi , and tci on successive rising edges of bclk until ta is recognized asserted. the latched data and the level on tci are then passed to the appropriate memory unit. if tbi was negated with ta , the processor continues the cycle with c3. otherwise, if tbi was asserted, the line transfer is burst inhibited, and the processor reads the remaining three long words using long-word read bus cycles. the processor increments a3 and a2 for each read, and the new address is placed on the address bus for each bus cycle. refer to 7.4.1 byte, word, and long-word read transfers for information on long- word reads. if no wait states are generated, a burst-inhibited line read completes in eight clocks instead of the five required for a burst read. clock 3 (c3) the processor holds the address and transfer attribute signals constant during c3. the selected device must increment a3 and a2 to reference the next long word to transfer, place the data on the data bus, and assert ta . at the end of c3, the processor samples the level of ta and latches the current value on the data bus. if ta is asserted, the transfer terminates, and the second long word of data is passed to the appropriate memory unit. if ta is not recognized asserted at the end of c3, the processor ignores the latched data and inserts wait states instead of terminating the transfer. the processor continues to sample ta on successive rising edges of bclk until it is recognized. the latched data is then passed to the appropriate memory unit. clock 4 (c4) this clock is identical to c3 except that once ta is recognized asserted, the latched value corresponds to the third long word of data for the burst. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 17 clock 5 (c5) this clock is identical to c3 except that once ta is recognized, the latched value corresponds to the third long word of data for the burst. after the processor recognizes the last ta assertion and terminates the line read bus cycle, tip remains asserted if the processor is ready to begin another bus cycle. otherwise, the processor negates tip during the first half of the next clock. figures 7-12 and 7-13 illustrate a flowchart and functional timing diagram for a burst- inhibited line read bus cycle. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 18 m68040 user? manual motorola present data 1) latch data
2) sample tbi and tci
3) recognize tbi asserted 1) remove data from d31?0
2) negate ta terminate cycle 1) set r/w to read
2) drive address on a31?0
3) drive user page attributes on upa1, upa0
4) drive size on siz1, siz0 (line)
5) drive transfer type on tt1, tt0
6) drive transfer modifier on tm2?m0
7) ciout becomes valid
8) assert ts for one clock
9) assert tip address device acquire data start next cycle processor external device 1) decode address
2) place data on d31?0
3) assert ta and tbi 1) negate tip (if required) end of line transfer 1) increment address bits a3, a2 and drive
new address on a31?0
2) drive size on siz1, siz0 (long word)
3) assert transfer start (ts) for one clock address device present data 1) remove data from d31?0
2) negate ta terminate cycle 1) decode address
2) place data on d31?0
3) assert ta 1) latch data acquire data when three long words
transferred until three long words
transferred figure 7-12. burst-inhibited line read transfer flowchart f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 19 a31?4 bclk tt1, tt0 tm2?m0 d31?0 upa1, upa0 ciout ts tip ta r/w a3 a2 a1, a0 siz1, siz0 tln1, tln0 line long long long tbi tci inhibited
line read long-word
read long-word
read long-word
read c1 c2 c3 c4 c6 c7 c5 c8 figure 7-13. burst-inhibited line read transfer timing f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 20 m68040 user? manual motorola 7.4.3 byte, word, and long-word write transfers during a write transfer, the processor transfers data to a memory or peripheral device. the level on the tci signal is ignored by the processor during all write cycles. the bus controller performs byte, word, and long-word write transfers for the following cases: accesses to a disabled cache. accesses to a memory page that is specified noncachable. accesses that are implicitly noncachable (read-modify-write accesses and accesses to an alternate logical address space via the moves instruction). writes to write-through pages. accesses that do not allocate in the data cache on a write miss (table updates and exception stacking). the first transfer of a line write is terminated with tbi , forcing completion of the line access using three additional long-word write transfers. cache line pushes for lines containing a single dirty long word. figures 7-14 and 7-15 illustrate a flowchart and functional timing diagram for byte, word, and long-word write bus transfers. address device 1) remove data from d31?0
2) negate tip (if required) terminate transfer start next cycle processor external device accept data terminate cycle 1) negate ta 1) set r/w to write
2) drive address on a31?0
3) drive user page attributes on upa1, upa0
4) drive size on siz1, siz0 (byte, word, or
long word)
5) drive transfer type on tt1, tt0 6) drive transfer modifier on tm2?m0
7) ciout becomes valid
8) assert ts for one clock
9) assert tip
10) drive data on appropriate bytes of
d31?0 based on sizex, a1, and a0 1) decode address
2) latch data on appropriate bytes of
d31?0 based on sizex, a1, and a0
3) assert transfer acknowledge (ta) figure 7-14. byte, word, and long-word write transfer flowchart f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 21 a31?0 bclk siz1, siz0 tt1, tt0 tm2?m0 d31?0 upa1, upa0 ciout ts tip ta r/w long long-word
write c1 c2 figure 7-15. long-word write transfer timing clock 1 (c1) the write cycle starts in c1. during the first half of c1, the processor places valid values on the address bus and transfer attributes. for user and supervisor mode accesses, which the corresponding memory unit translates, the upax signals are driven with the values from the u1 and u0 bits for the area. the ttx and tmx signals identify the specific access type. the r/ w signal is driven low for a write cycle. ciout is asserted if the access is identified as noncachable or if the access references an alternate address space. refer to section 3 memory management unit (except mc68ec040 and mc68ec040v) for information on the m68040 and mc68lc040 memory units and appendix b mc68ec040 for information on the mc68ec040 memory unit. the processor asserts ts during c1 to indicate the beginning of a bus cycle. if not already asserted from a previous bus cycle, the tip signal is also asserted at this time to indicate that a bus cycle is active. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 22 m68040 user? manual motorola clock 2 (c2) during the first half of the clock after c1, the processor negates ts and drives the appropriate bytes of the data bus with the data to be written. all other bytes are driven with undefined values. the selected device uses r/ w , siz1, siz0, a1, a0, and ciout to latch only the required information on the data bus. with the exception of r/ w and ciout , these signals also select any or all of the bytes (d31?24, d23?16, d15?8, and d7?0). if the first clock after c1 is not a wait state, then the selected peripheral device asserts the ta signal. at the end of the first clock cycle after c1, the processor samples the level of ta , terminating the bus cycle if ta is asserted. if ta is not recognized asserted at the end of the clock cycle, the processor ignores the data and inserts a wait state instead of terminating the transfer. the processor continues to sample ta on successive rising edges of bclk until ta is recognized asserted. the data bus then three-states and the bus cycle ends. when the processor recognizes ta at the clock edge and terminates the bus cycle, tip remains asserted if the processor is ready to begin another bus cycle. otherwise, the processor negates tip during the first half of the next clock. the processor also three- states the data bus during the first half of the next clock following termination of the write transfer. 7.4.4 line write transfers the processor uses line write bus cycles to access a 16-byte operand for a move16 instruction and to support cache line pushes. both burst and burst-inhibited transfers are supported. figures 7-16 and 7-17 illustrate a flowchart and functional timing diagram for a line write bus cycle. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 23 1) set r/w to write
2) drive address on a31?0
3) drive user page attributes on upa1, upa0
4) drive size on siz1, siz0 (line)
5) drive transfer type on tt1, tt0
6) drive transfer modifier on tm2?m0
7) ciout becomes valid accept data 1) drive data on d31?0
2) sample ta
3) sample tbi and tci (for first transfer) 1) negate ta (if necessary)
2) increment address bits a3, a2 (if
necessary) terminate cycle address device supply data start next cycle processor external device 1) decode address (first transfer only)
2) latch data on d31?0
3) assert ta until four long
words transferred 1) remove data from d31?0
2) negate tip (if required) end of burst until four long
words transferred when four long
words transferred 8) assert ts for one clock
9) assert tip figure 7-16. line write transfer flowchart f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 24 m68040 user? manual motorola a31?4 bclk siz1, siz0 tt1, tt0 tm2?m0 d31?0 upa1, upa0 ciout ts tip ta r/w a3 a2?0 note: the selected device increments the value of a3 and a2. 10 11 00 01 c1 c2 c3 c4 c5 a3, a2 = figure 7-17. line write transfer timing clock 1 (c1) the line write cycle starts in c1. during the first half of c1, the processor places valid values on the address bus and transfer attributes. for user and supervisor mode accesses that are translated by the corresponding memory unit, upax signals are driven with the values from the matching u1 and u0 bits. the ttx and tmx signals identify the specific access type. the r/ w signal is driven low for a write cycle, and siz1 and siz0 indicate line size. ciout is asserted for a move16 operand read if the access is identified as noncachable. refer to section 3 memory management unit (except mc68ec040 and mc68ec040v) for information on the m68040 and f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 25 mc68lc040 memory units and appendix b mc68ec040 for information on the mc68ec040 memory unit. the processor asserts ts during c1 to indicate the beginning of a bus cycle. if not already asserted from a previous bus cycle, the tip signal is also asserted at this time to indicate that a bus cycle is active. clock 2 (c2) during the first half of the first clock after c1, the processor negates ts and drives the data bus with the data to be written. the selected device uses r/ w , siz1, and siz0 to latch the data on the data bus. concurrently, the selected device asserts ta and either negates or asserts tbi to indicate it can or cannot support a burst transfer. at the end of the first clock after c1, the processor samples the level of ta and tbi . if ta is asserted, the transfer terminates. if ta is not recognized asserted, the processor inserts wait states instead of terminating the transfer. the processor continues to sample ta and tbi on successive rising edges of bclk until ta is recognized asserted. if tbi was negated with ta , the processor continues the cycle with c3. otherwise, if tbi was asserted, the line transfer is burst inhibited, and the processor writes the remaining three long words using long-word write bus cycles. only in this case does the processor increment a3 and a2 for each write, and the new address is placed on the address bus for each bus cycle. refer to 7.4.3 byte, word, and long-word write transfers for information on long-word writes. if no waits states are generated, a burst-inhibited line write completes in eight clocks instead of the five required for a burst write. clock 3 (c3) the processor drives the second long word of data on the data bus and holds the address and transfer attribute signals constant during c3. the selected device increments a3 and a2 to reference the next long word, latches this data from the data bus, and asserts ta . at the end of c3, the processor samples the level of ta ; if ta is asserted, the transfer terminates. if ta is not recognized asserted at the end of c3, the processor inserts wait states instead of terminating the transfer. the processor continues to sample ta on successive rising edges of bclk until ta is recognized asserted. clock 4 (c4) this clock is identical to c3 except that the value driven on the data bus corresponds to the third long word of data for the burst. clock 5 (c5) this clock is identical to c3 except that the value driven on the data bus corresponds to the fourth long word of data for the burst. after the processor recognizes the last ta assertion and terminates the line write bus cycle, tip remains asserted if the processor is ready to begin another bus cycle. otherwise, the processor negates tip during the first half of the next clock. the processor also three-states the data bus during the first half of the next clock following termination of the write cycle. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 26 m68040 user? manual motorola 7.4.5 read-modify-write transfers (locked transfers) the read-modify-write transfer performs a read, conditionally modifies the data in the processor, and writes the data out to memory. in the m68040, this operation can be indivisible, providing semaphore capabilities for multiprocessor systems. during the entire read-modify-write sequence, the m68040 asserts the lock signal to indicate that an indivisible operation is occurring and asserts the locke signal for the last transfer to indicate completion of the locked sequence. the external arbiter can use the lock and locke signals to prevent arbitration of the bus during locked processor sequences. external bus arbitrations can use locke to support bus arbitration between consecutive read-modify-write cycles. a read-modify-write operation is treated as noncachable. if the access hits in the data cache, it invalidates a matching valid entry and pushes a matching dirty entry. the read-modify-write transfer begins after the line push (if required) is complete; however, lock may assert during the line push bus cycle. the tas, cas, and cas2 instructions are the only m68040 instructions that utilize read- modify-write transfers. some page descriptor updates during translation table searches also use read-modify-write transfers. refer to section 3 memory management unit (except mc68ec040 and mc68ec040v) for information about table searches. the read-modify-write transfer for the cas and cas2 instructions in the m68040 differs from those used by previous members of the m68000 family. if an operand does not match one of these instructions, the m68040 still executes a single write transfer to terminate the locked sequence with locke asserted. for the cas instruction, the value read from memory is written back; for the cas2 instruction, the second operand read is written back. figure 7-18 illustrates a functional timing diagram for a tas instruction read- modify-write bus transfer. clock 1 (c1) the read cycle starts in c1. during the first half of c1, the processor places valid values on the address bus and transfer attributes. lock is asserted to identify a locked read- modify-write bus cycle. for user and supervisor mode accesses, which the corresponding memory unit translates, the upax signals are driven with the values from the matching u1 and u0 bits. the ttx and tmx signals identify the specific access type. r/ w is driven high for a read cycle. ciout is asserted if the access is identified as noncachable. the processor asserts ts during c1 to indicate the beginning of a bus cycle. if not already asserted from a previous bus cycle, the tip signal is also asserted at this time to indicate that a bus cycle is active. refer to section 3 memory management unit (except mc68ec040 and mc68ec040v) for information on the m68040 and mc68lc040 memory units and appendix b mc68ec040 for information on the mc68ec040 memory unit. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 27 c1 c2 a31?0 bclk byte siz1 tt1, tt0 tm2?m0 d31?24 upa1, upa0 ciout ts tip ta r/w siz0 d23?16 d15?8 d7?0 ci c3 c4 locke lock locked transfer undefined figure 7-18. locked transfer for tas instruction timing f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 28 m68040 user? manual motorola clock 2 (c2) during the first half of the first clock cycle after c1, the processor negates ts . the selected device uses r/ w , siz1, siz0, a1, and a0 to place its information on the data bus. with the exception of r/ w , these signals also select any or all of the bytes (d24 d31, d16?23, d15?8, and d7?0). concurrently, the selected device asserts ta . at the end of the first clock cycle after c1, the processor samples the level of ta and latches the current value on the data bus. if ta is asserted, the read transfer terminates, and the latched data is passed to the appropriate memory unit. if ta is not recognized asserted, the processor ignores the data and appends a wait state instead of terminating the transfer. the processor continues to sample ta on successive rising edges of bclk until ta is recognized as asserted. the latched data is then passed to the appropriate memory unit. if more than one read cycle is required to read in the operand(s), c1 and c2 are repeated accordingly. when the processor recognizes ta at the end of the last read transfer for the locked bus cycle, it negates tip during the first half of the next clock. clock idle (ci) the processor does not assert any new control signals during the idle clock states, but it may begin the modify portion of the cycle at this time. the r/ w signal remains in the read mode until c3 to prevent bus conflicts with the preceding read portion of the cycle; the data bus is not driven until c4. clock 3 (c3) during the first half of c3, the processor places valid values on the address bus and transfer attributes and drives r/ w low for a write cycle. the processor asserts ts to indicate the beginning of a bus cycle. the tip signal is also asserted at this time to indicate that a bus cycle is active. locke is asserted during c3 for the last write transfer of the locked sequence. if multiple write transfers are required for misaligned operands or multiple operands, locke is asserted only for the final write transfer. the external arbiter can use this indication to distinguish between two back-to-back locked bus cycles and allow arbitration between them. clock 4 (c4) during the first half of c4, the processor negates ts and drives the appropriate bytes of the data bus with the data to be written. all other bytes are driven with undefined values. the selected device uses r/ w , siz1, siz0, a1, and a0 to latch the information on the data bus. any or all of the bytes (d31?24, d23?16, d15?8, and d7?0) are selected by siz1, siz0, a1, and a0. concurrently, the selected device asserts ta . at the end of c4, the processor samples the level of ta ; if ta is asserted, the bus cycle terminates. if ta is not recognized asserted at the end of c4, the processor appends a wait state instead of terminating the transfer. the processor continues to sample the ta signal on successive rising edges of bclk until it is recognized asserted. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 29 when the processor recognizes ta at the end of a clock, the bus cycle is terminated, but tip remains asserted if the processor is ready to begin another bus cycle. otherwise, the processor negates tip during the first half of the next clock. the processor also three-states the data bus during the first half of the next clock following termination of the write cycle. when the last write transfer is terminated, locke is negated. the processor also negates lock if the next bus cycle is not a read-modify- write. 7.5 acknowledge bus cycles bus transfers with transfer type signals tt1 and tt0 = $3 are classified as acknowledge bus cycles. the following paragraphs describe interrupt acknowledge and breakpoint acknowledge bus cycles that use this encoding. 7.5.1 interrupt acknowledge bus cycles when a peripheral device requires the services of the m68040 or is ready to send information that the processor requires, it can signal the processor to take an interrupt exception. the interrupt exception transfers control to a routine that responds appropriately. the peripheral device uses the active-low interrupt priority level signals ( ipl2 ipl0 ) to signal an interrupt condition to the processor and to specify the priority level for the condition. refer to section 8 exception processing for a discussion on the ipl? levels and ipend . the status register (sr) of the m68040 contains an interrupt priority mask (i2?0 bits). the value in the interrupt mask is the highest priority level that the processor ignores. when an interrupt request has a priority higher than the value in the mask, the processor makes the request a pending interrupt. ipl2 ipl0 must maintain the interrupt request level until the m68040 acknowledges the interrupt to guarantee that the interrupt is recognized. the m68040 continuously samples ipl2 ipl0 on consecutive rising edges of bclk to synchronize and debounce these signals. an interrupt request that is held constant for two consecutive clock periods is considered a valid input. although the protocol requires that the request remain until the processor runs an interrupt acknowledge cycle for that interrupt value, an interrupt request that is held for as short a period as two clock cycles can be recognized. figure 7-19 is a flowchart of the procedure for making an interrupt pending. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 30 m68040 user? manual motorola reset sample and synchronize
ipl2?pl0 assert ipend otherwise interrupt level i2?0,
or transition on level 7 > figure 7-19. interrupt pending procedure the m68040 asserts ipend when an interrupt request is pending. figure 7-20 illustrates the assertion of ipend relative to the assertion of an interrupt level on the ipl? signals. ipend signals external devices that an interrupt exception will be taken at an upcoming instruction boundary (following any higher priority exception). the ipend signal negates after the processor recognizes the internal interrupt acknowledge and can precede the external interrupt acknowledge bus cycle. bclk ipl2?pl0 ipend compare request with mask in sr assert ipend ipls recognized ipls synchronized figure 7-20. assertion of ipend f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 31 the m68040 takes an interrupt exception for a pending interrupt within one instruction boundary after processing any other pending exception with a higher priority. thus, the m68040 executes at least one instruction in an interrupt exception handler before recognizing another interrupt request. the following paragraphs describe the various kinds of interrupt acknowledge bus cycles that can be executed as part of interrupt exception processing. table 7-4 provides a summary of the possible interrupt acknowledge terminations and the exception processing results. table 7-4. interrupt acknowledge termination summary ta tea avec termination condition high high don? care insert waits high low don? care take spurious interrupt exception low high high latch vector number on d7?0 and take interrupt exception low high low take autovectored interrupt exception low low don? care retry interrupt acknowledge cycle 7.5.1.1 interrupt acknowledge bus cycle (terminated normally). when the m68040 processes an interrupt exception, it performs an interrupt acknowledge bus cycle to obtain the vector number that contains the starting location of the interrupt exception handler. some interrupting devices have programmable vector registers that contain the interrupt vectors for the exception handlers they use. other interrupting conditions or devices cannot supply a vector number and use the autovector bus cycle described in 7.5.1.2 autovector interrupt acknowledge bus cycle . f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 32 m68040 user? manual motorola the interrupt acknowledge bus cycle is a read transfer. it differs from a normal read cycle in the following respects: 1. tt1 and tt0 = $3 to indicate an acknowledged bus cycle. 2. address signals a31?0 are set to all ones ($ffffffff). 3. tm2?m0 are set to the interrupt request level (the inverted values of ipl2eipl0 ). the responding device places the vector number on the data bus during the interrupt acknowledge bus cycle, and the cycle is terminated normally with ta . figures 7-21 and 7-22 illustrate a flowchart and functional timing diagram for an interrupt acknowledge cycle terminated with ta . acknowledge interrupt 1) latch vector number acquire data start next cycle 3) drive a31?0 to $ffffffff
4) drive upa1, upa0 to $0
5) set size to byte
6) set transfer type on tt1, tt0 to $3
7) place interrupt level on tm2?m0
processor external device provide vector information terminate cycle 1) place vector number on byte d7?0
2) assert transfer acknowledge (ta) 1) remove data from d7?0
2) negate ta request interrupt 8) negate ciout
9) assert ts for one clock
10) assert tip 1) ipend recognized, wait for
instruction boundary
2) set r/w to read figure 7-21. interrupt acknowledge bus cycle flowchart f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 33 c1 c2 a31?0 bclk byte siz1 tt1, tt0 tm2?m0 d31?8 upa1, upa0 ciout ts tip ta r/w siz0 d7?0 interrupt
acknowledge interrupt level avec c1 c2 write stack vector # figure 7-22. interrupt acknowledge bus cycle timing 7.5.1.2 autovector interrupt acknowledge bus cycle. when the interrupting device cannot supply a vector number, it requests an automatically generated vector (autovector). instead of placing a vector number on the data bus and asserting ta , the device asserts the autovector ( avec ) signal with ta to terminate the cycle. avec is only sampled with ta asserted. avec can be grounded if all interrupt requests are autovectored. the vector number supplied in an autovector operation is derived from the interrupt priority level of the current interrupt. when the avec signal is asserted with ta during an interrupt acknowledge bus cycle, the m68040 ignores the state of the data bus and internally f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 34 m68040 user? manual motorola generates the vector number, which is the sum of the interrupt priority level plus 24 ($18). there are seven distinct autovectors that can be used, corresponding to the seven levels of interrupts available with ipl2eipl0 signals. figure 7-23 illustrates a functional timing diagram for an autovector operation. c1 c2 a31?0 bclk byte siz1 tt1, tt0 tm2?m0 d31?0 upa1, upa0 ciout ts tip ta r/w siz0 interrupt
acknowledge
autovectored interrupt level avec c1 c2 write stack figure 7-23. autovector interrupt acknowledge bus cycle timing 7.5.1.3 spurious interrupt acknowledge bus cycle. when a device does not respond to an interrupt acknowledge bus cycle with ta , or avec and ta , the external logic typically returns the transfer error acknowledge signal ( tea ). in this case, the m68040 automatically generates the spurious interrupt vector number 24 ($18) instead of the interrupt vector number. if ta and tea are both asserted, the processor retries the cycle. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 35 7.5.2 breakpoint interrupt acknowledge bus cycle the execution of a breakpoint instruction (bkpt) generates the breakpoint interrupt acknowledge bus cycle. an acknowledged access is indicated with tt1 and tt0 = $3, address a31?0 = $00000000, and tm2?m0 = $0. when the external device terminates the cycle with either ta or tea , the processor takes an illegal instruction exception. figures 7-24 and 7-25 illustrate a flowchart and functional timing diagram for a breakpoint interrupt acknowledge transfer. breakpoint acknowledge initiate illegal
instruction exception processing 1) set r/w to read
2) drive a31?0 to $00000000
3) drive upa1, upa0 to $0
4) set size to byte
5) set transfer type on tt1, tt0 to $3
6) set transfer modifier tm2?m0 to $0
processor external device terminate cycle assert ta or tea 1) negate ta or tea 8) negate ciout
9) assert ts for one clock
10) assert tip figure 7-24. breakpoint interrupt acknowledge bus cycle flowchart f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 36 m68040 user? manual motorola c1 c2 a31?0 bclk byte siz1 tt1, tt0 tm2?m0 d31?0 upa1, upa0 ciout ts tip ta r/w siz0 breakpoint
acknowledge c1 c2 write stack figure 7-25. breakpoint interrupt acknowledge bus cycle timing 7.6 bus exception control cycles the m68040 bus architecture requires assertion of ta from an external device to signal that a bus cycle is complete. ta is not asserted in the following cases: the external device does not respond. no interrupt vector is provided. various other application-dependent errors occur. external circuitry can provide tea when no device responds by asserting ta within an appropriate period of time after the processor begins the bus cycle. this allows the cycle to terminate and the processor to enter exception processing for the error condition. tea can also be asserted in combination with ta to cause a retry of a bus cycle in error. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 37 to properly control termination of a bus cycle for a bus error or retry condition, ta and tea must be asserted and negated for the same rising edge of bclk. table 7-5 lists the control signal combinations and the resulting bus cycle terminations. bus error and retry terminations during burst cycles operate as described in 7.4.2 line read transfers and 7.4.4 line write transfers . table 7-5. ta and tea assertion results case no. ta tea result 1 high low bus error?erminate and take bus error exception, possibly deferred 2 low low retry operation?erminate and retry 3 low high normal cycle terminate and continue 4 high high insert wait states 7.6.1 bus errors the system hardware can use the tea signal to abort the current bus cycle when a fault is detected. a bus error is recognized during a bus cycle when ta is negated and tea is asserted. when the processor recognizes a bus error condition for an access, the access is terminated immediately. a line access that has tea asserted for one of the four long- word transfers aborts without completing the remaining transfers, regardless of whether the line transfer uses a burst or burst-inhibited access. when tea is asserted to terminate a bus cycle, the m68040 can enter access error exception processing immediately following the bus cycle, or it can defer processing the exception. the instruction prefetch mechanism requests instruction words from the instruction memory unit before it is ready to execute them. if a bus error occurs on an instruction fetch, the processor does not take the exception until it attempts to use the instruction. should an intervening instruction cause a branch or should a task switch occur, the access error exception for the unused access does not occur. similarly, if a bus error is detected on the second, third, or fourth long-word transfer for a line read access, an access error exception is taken only if the execution unit is specifically requesting that long word. otherwise, the line is not placed in the cache, and the processor repeats the line access when another access references the line. if a misaligned operand spans two long words in a line, a bus error on either the first or second transfer for the line causes exception processing to begin immediately. a bus error termination for any write accesses or for read accesses that reference data specifically requested by the execution unit causes the processor to begin exception processing immediately. refer to section 8 exception processing for details of access error exception processing. when a bus error terminates an access, the contents of the corresponding cache can be affected in different ways, depending on the type of access. for a cache line read to replace a valid instruction or data cache line, the cache line being filled is invalidated before the bus cycle begins and remains invalid if the replacement line access is terminated with a bus error. if a dirty data cache line is being replaced and a bus error occurs during the replacement line read, the dirty line is restored from an internal push f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 38 m68040 user? manual motorola buffer into the cache to eliminate an unnecessary push access. if a bus error occurs during a data cache push, the corresponding cache line remains valid (with the new line data) if the line push follows a replacement line read, or is invalidated if a cpush instruction explicitly forces the push. write accesses to memory pages specified as write- through by the data memory unit update the corresponding cache line before accessing memory. if a bus error occurs during a memory access, the cache line remains valid with the new data. figure 7-26 illustrates a functional timing diagram of a bus error on a word write access causing an access error exception. figure 7-27 illustrates a functional timing diagram of a bus error on a line read access that does not cause an access error exception. a physical bus error during an fsave instruction results in corruption of the floating-point state frame. this is not a serious limitation since, prior to writing the stack frame, the m68040 ensures that the pages required for the floating-point state frame are resident. therefore, only a physical bus error can cause an access error during the stacking of the state frame. in a normal application, writes caused by the processor should not result in a physical bus error since the logical address space has already been translated and allocated. since there should be no parity errors caused by processor write accesses, only spurious assertions of the tea pin can cause physical bus errors. furthermore, because fsave instructions usually place the state frame on the system stack, the occurrence of a physical bus error when using the system stack indicates a serious hardware error. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 39 c1 c2 a31?0 bclk word siz1 tt1, tt0 tm2?m0 d31?0 upa1, upa0 ciout ts tip ta r/w siz0 write cycle c1 c2 write stack tea figure 7-26. word write access terminated with tea timing f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 40 m68040 user? manual motorola a31?4 bclk siz1, siz0 tt1, tt0 tm2?m0 d31?0 upa1, upa0 ciout ts tip ta r/w a3 a2?0 note: the selected device increments the value on a3 and a2. 10 11 01 c1 c2 c3 c4 tbi tea tea ends burst ? no exception
taken a3, a2 = figure 7-27. line read access terminated with tea timing f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 41 7.6.2 retry operation when an external device asserts both the ta and tea signals during a bus cycle, the processor enters the retry sequence. the processor terminates the bus cycle and immediately retries the cycle using the same access information (address and transfer attributes). however, if the bus cycle was a cache push operation, the bus is arbitrated away from the m68040 before the retry operation, and a snoop during the arbitration invalidates the cache push, then the processor does not use the same access information. figure 7-28 illustrates a functional timing diagram for a retry of a read bus transfer. c1 c2 a31?0 bclk siz1, siz0 tt1, tt0 tm2?m0 d31?0 upa1, upa0 ciout ts tip ta r/w read cycle
retry signaled retry
cycle c1 c2 tea cw long word figure 7-28. retry read transfer timing the processor retries any read or write cycles of a read-modify-write transfer separately; lock remains asserted during the entire retry sequence. if the last bus cycle of a locked access is retried, locke remains asserted through the retry of the write cycle. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 42 m68040 user? manual motorola on the initial cycle of a line transfer, a retry causes the processor to retry the bus cycle as illustrated in figure 7-29. however, the processor recognizes a retry signaled during the second, third, or fourth cycle of a line as a bus error and causes the processor to abort the line transfer. a burst-inhibited line transfer can only be retried on the initial transfer. a burst-inhibited line transfer aborts if a retry is signaled for any of the three long-word transfers used to complete the line transfer. negating the bus grant ( bg ) signal on the m68040 while asserting both ta and tea provides a relinquish and retry operation for any bus cycle that can be retried (see figure 7-31). c1 c2 a31?0 bclk siz1, siz0 tt1, tt0 tm2?m0 d31?0 upa1, upa0 ciout ts tip ta r/w line retry
signaled retry cycle c1 c2 tea tbi c3 c4 c5 figure 7-29. retry operation on line write f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 43 7.6.3 double bus fault a double bus fault occurs when an access or address error occurs during the exception processing sequence?.g., the processor attempts to stack several words containing information about the state of the machine while processing an access error exception. if a bus error occurs during the stacking operation, the second error is considered a double bus fault. the m68040 indicates a double bus fault condition by continuously driving pst3?st0 with an encoded value of $5 until the processor is reset. only an external reset operation can restart a halted processor. while the processor is halted, negating br and forcing all outputs to a high-impedance state releases the external bus. a second access or address error that occurs during execution of an exception handler or later, does not cause a double bus fault. a bus cycle that is retried does not constitute a bus error or contribute to a double bus fault. the processor continues to retry the same bus cycle as long as external hardware requests it. 7.7 bus synchronization the m68040 integer unit generates access requests to the instruction and data memory units to support integer and floating-point operations. both the fetch and write-back stages of the integer unit pipeline perform accesses to the data memory unit, with effective address fetches assigned a higher priority. this priority allows data read and write accesses to occur out of order, with a memory write access potentially delayed for many clocks while allowing read accesses generated by later instructions to complete. the processor detects a read access that references earlier data waiting to be written (address collisions) and allows the corresponding write access to complete. a given sequence of read accesses or write accesses is completed in order, and reordering only occurs with writes relative to reads. figure 2-1 in section 2 integer unit illustrates the integer pipeline stages. besides address collisions, the instruction restart model used for exception processing in the m68040 causes another potential problem. after the operand fetch for an instruction, an exception that causes the instruction to be aborted can occur, resulting in another access for the operand after the instruction restarts. for example, an exception could occur after a read access of an i/o device? status register. the exception causes the instruction to be aborted and the register to be read again. if the first read accesses clears the status bits, the status information is lost, and the instruction obtains incorrect data. designating the memory page containing the address of the device as serialized noncachable prevents multiple out-of-order accesses to devices sensitive to such accesses. when the data memory unit detects an attempt to read an operand from a page designated as serialized noncachable, it allows all pending write accesses to complete before beginning the external read access. the definition of a page as noncachable versus serialized noncachable only affects read accesses. when a write operation reaches the integer unit? write-back stage, all previous instructions have completed. when a read access to a serialized noncachable page begins, only a bus error exception f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 44 m68040 user? manual motorola on the operand read itself can cause the instruction to be aborted, preventing multiple reads. it is important to note that when memory accesses are serialized noncachable, fmove will cause two identical writes to the same location to occur if the next instruction prefetch receives a bus error. since write cycles can be deferred indefinitely, many subsequent instructions can be executed, resulting in seemingly nonsequential instruction execution. when this action is not desired and the system depends on sequential execution following bus activity, the nop instruction can be used. the nop instruction forces instruction and bus synchronization because it freezes instruction execution until all pending bus cycles have completed. a write operation of control information to an external register in which the external hardware attempts to control program execution based on the data that is written with the conditional assertion of tea is one situation where the nop instruction can be used to prevent multiple executions. if the data cache is enabled and the write cycle results in a hit in the data cache, the cache is updated. that data, in turn, may be used in a subsequent instruction before the external write cycle completes. since the m68040 cannot process the bus error until the end of the bus cycle, the external hardware cannot successfully interrupt program execution. to prevent a subsequent instruction from executing until the external cycle completes, the nop instruction can be inserted after the instruction causing the write. in this case, access error exception processing proceeds immediately after the write before subsequent instructions are executed. this is an irregular situation, and the use of the nop instruction for this purpose is not required by most systems. note that the nop instruction can also be used to force access serialization by placing nop before the instruction that reads an i/o device. this practice eliminates the need to specify the entire page as serialized noncachable but does not prevent the instruction from being aborted by an exception condition. 7.8 bus arbitration and examples the bus design of the m68040 provides for one bus master at a time, either the m68040 or an external device. more than one device having the capability to control the bus can be attached to the bus. an external arbiter prioritizes requests and determines which device is granted access to the bus. bus arbitration is the protocol by which the processor or an external device becomes the bus master. when the m68040 is the bus master, it uses the bus to read instructions and data not contained in its internal caches from memory and to write data to memory. when an alternate bus master owns the bus, the m68040 is able to monitor the alternate bus master? transfer and intervene when necessary to maintain cache coherency. this capability is discussed in more detail in 7.9 bus snooping operation. unlike earlier members of the m68000 family, the m68040 implements an arbitration method in which an external arbiter controls bus arbitration and the processor acts as a slave device requesting ownership of the bus from the arbiter. since the user defines the functionality of the external arbiter, it can be configured to support any desired priority scheme. for systems in which the processor is the only possible bus master, the bus can f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 45 be continuously granted to the processor, and no arbiter is needed. systems that include several devices that can become bus masters require an arbiter to assign priorities to these devices so that, when two or more devices simultaneously attempt to become the bus master, the one having the highest priority becomes the bus master first. 7.8.1 bus arbitration the m68040 bus controller generates bus requests to the external arbiter in response to internal requests from the instruction and data memory units. the m68040 performs bus arbitration using the bus request ( br ), bus grant ( bg ), and bus busy ( bb ) signals. the arbitration protocol, which allows arbitration to overlap with bus activity, requires a single idle clock to prevent bus contention when transferring bus ownership between bus masters. the bus arbitration unit in the m68040 operates synchronously and transitions between states on the rising edge of blck. the m68040 requests the bus from the external bus arbiter by asserting br whenever an internal bus request is pending. the processor continues to assert br for as long as it requires the bus. the processor negates br at any time without regard to the status of bg and bb . if the bus is granted to the processor when an internal bus request is generated, br is asserted simultaneously with transfer start ( ts) , allowing the access to begin immediately. the processor always drives br , and br cannot be wire-ored with other devices. the external arbiter asserts bg to indicate to the processor that it has been granted the bus. if bg is negated while a bus cycle is in progress, the processor relinquishes the bus at the completion of the bus cycle. to guarantee that the bus is relinquished, bg must be negated prior to the rising edge of the bclk in which the last ta or tea is asserted. note that the bus controller considers the four bus transfers for a burst-inhibited line transfer to be a single bus cycle and does not relinquish the bus until completion of the fourth transfer. the read and write portions of a locked read-modify-write sequence are divisible in the m68040, allowing the bus to be arbitrated away during the locked sequence. for system applications that do not allow locked sequences to be broken, the arbiter can use lock to detect locked accesses and prevent the negation of bg to the processor during these sequences. the processor also provides the locke signal to indicate the last write cycle of a locked sequence, allowing arbitration between back-to-back locked sequences. see 7.4.5 read-modify-write transfers (locked transfers) for a detailed description of read-modify-write transfers. when the bus has been granted to the processor in response to the assertion of br , one of two situations can occur. in the first situation, the processor monitors bb to determine when the bus cycle of the alternate bus master is complete. after the alternate bus master negates bb , the processor asserts bb to indicate explicit bus ownership and begins the bus cycle by asserting ts . the processor continues to assert bb until the external arbiter negates bg , after which bb is first negated at the completion of the bus cycle, then forced to a high-impedance state. as long as bg is asserted, bb remains asserted to indicate the bus is owned, and the processor continuously drives the bus signals. the processor negates br when there are no pending accesses to allow the external arbiter to grant the bus to the alternate bus master if necessary. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 46 m68040 user? manual motorola in the second situation, the processor samples bb until the external bus arbiter negates bb . the processor drives its output pins with undetermined values and three-states bb , but does not perform a bus cycle. this procedure, called implicit ownership of the bus, occurs when the processor is granted the bus but there are no pending bus cycles. if an internal access request is generated, the processor assumes explicit ownership of the bus and immediately begins an access, simultaneously asserting bb , br , tip , and ts . if the external arbiter keeps bg asserted after completion of the bus cycle, the processor keeps bb asserted and drives the bus with undefined values, causing the processor to park. in this case, because bb remains asserted until the external arbiter negates bg , the processor must assert br , tip , and ts simultaneously to enter an active bus cycle. when it completes the active bus cycle and the external arbiter has not negated bg , the processor goes back into park, negating br , tip , and ts . as long as bg is asserted, the processor oscillates between park and active bus cycles. the m68040 can be in any one of five bus arbitration states during bus operation: idle, snoop, implicit ownership, park, and active bus cycle. there are two characteristics that determine these five states: whether the three-state logic determines if the m68040 drives the bus and how the m68040 drives bb . if neither the processor nor the external bus arbiter asserts bb , then an external pullup resistor drives bb high to negate it. note that the relationship between the internal br and the external br is best described as a synchronous delay off bclk. the idle state occurs when the m68040 does not have ownership of the bus and is not in the process of snooping an access. in the idle state, bb is negated and the m68040 does not drive the bus. the snoop state is similar to the idle state in that the m68040 does not have ownership of the bus. the snoop state differs from the idle state in that the m68040 is ready to service snooped transfers. otherwise, the status of bb and the bus is identical. the implicit ownership state indicates that the m68040 owns the bus. the m68040 explicitly owns the bus when it runs a bus cycle immediately after being granted the bus. if the processor has completed at least one bus cycle and no internal transfers are pending, the processor drives the bus with undefined values, entering the park state. in either case, bg remains asserted. the simultaneous assertion of br , tip , and ts allows the processor to leave the park state and enter the active bus cycle state. figure 7-30 is a bus arbitration state diagram illustrating the relationship of these five states with an example of an external bus arbiter circuit. table 7-6 lists the five states and the conditions that indicate them. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user?s manual 7- 47 own/park,
implicit
ownership, bgi l *tsi *bgi l *tsi l *bbi *endcycle l *bbi *endcycle l bbi l bg bg* l ibr snoop,
bbo driven by
mc68040,
*three-stated *bg l ibr endcycle *endcycle l bbi l *bg l ibr *endcycle l bbi l *bg l *ibr bg l endcycle
l *tip *bg bbi l *bg l ibr l tsi bg bg l tip bbi l *bg l *ibr l tsi bg l *endcycle l tip* bg l tsi *bg l tsi l *bbi idle, *bg l *tsi l bbi protocol
violation ibr
bbi
tsi
endcycle = internal bus request signal (see schematic below).
= bus busy driven by alternate bus master. = transfer start as an input, sampled by the mc68040.
= whatever terminates a bus transaction
whether it is normal, bus error, or retried. note
that false burst cycles are treated as a line
transaction. false locked transactions
are treated the same as any other bus cycle. *bbo driven by
mc68040,
three-stated bbo driven by
mc68040,
three-stated bbo driven by
mc68040,
*three-stated d bb bbi q br bbo ibr bclk = the 040 may or may not transition if an active bus
cycle is terminated with a bus error, and bg is
asserted. * = indicates the signal is asserted for that device. figure 7-30. m68040 internal interpretation state diagram and external bus arbiter circuit f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 48 m68040 user? manual motorola table 7-6. m68040 bus arbitration states bb bg state conditions negated negated idle m68040 three-states bb ; arbiter negates bg ; bus is not driven. negated asserted implicit ownership m68040 three-states bb ; arbiter asserts bg ; bus is driven with undefined values. asserted negated active bus cycle m68040 asserts bb ; arbiter asserts bg ; bus is driven with defined values; tip is asserted. asserted asserted park m68040 asserts bb ; arbiter asserts bg ; bus is driven with undefined values; tip is asserted. asserted asserted alternate bus master ownership and snooped m68040 three-states bb ; arbiter asserts bg ; m68040 does not drive the bus. the m68040 can be in the active bus cycle, park, or implicit ownership states when bg is negated. depending on the state the processor is in when bg is negated, uncertain conditions can occur. the only guaranteed time that the processor relinquishes the bus is when bg is negated prior to the rising edge of bclk in which the last ta or tea is asserted and the processor is in the active bus cycle state. however, if the processor is in either the active bus cycle, park, or implicit ownership states and bg is negated at the same time or after the last ta or tea is asserted, then from the standpoint of the external bus arbiter, the next action that the processor takes is undetermined because the processor can internally decide to perform another active bus cycle (indeterminate condition). external bus arbiters must consider this indeterminate condition when negating bg and must be designed to examine the state of bb immediately after negating bg to determine whether or not the processor will run another bus cycle. a somewhat dangerous situation exists when the processor begins a locked transfer after the bus has been granted to the alternate bus master, causing the alternate bus master to perform a bus transfer during a locked sequence. to correct this situation, the external bus arbiter must be able to recognize the possible indeterminate condition and reassert bg to the processor when the processor begins a locked sequence. the indeterminate condition is most significant when dealing with systems that cannot allow locked transfers to be broken. figure 7-31 illustrates an example of an error condition that is a consequence of the interaction between the indeterminate condition and a locked transfer. external bus arbiters must be designed so that all bus grants to all bus masters be nagated for at least one rising edge of bclk between bus tenures; preventing bus conflicts resulting from the above conditions. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 49 040_bg am_bg 040_bb possible
indeterminate
condition am_bb the 040
actively
owns the
bus here 040_lock lock is
violated 040_ts 040_ta am_ts * am indicates the alternate bus master. * * * figure 7-31. lock violation example in addition to the indeterminate condition, the external arbiter? design needs to include the function of br . for example, in certain cases associated with conditional branches, the m68040 can assert br to request the bus from an alternate bus master, then negate br without using the bus, regardless of whether or not the external arbiter eventually asserts bg . this situation happens when the m68040 attempts to prefetch an instruction for a conditional branch. to achieve maximum performance, the processor prefetches the instructions of both paths for a conditional branch. if the conditional branch results in a branch-not-taken, the previously issued branch-taken prefetch is then terminated since the prefetch is no longer needed. in an attempt to save time, the m68040 negates br . if bg takes too long to assert, the m68040 enters a disregard request condition. the br signal can be reasserted immediately for a different pending bus request, or it can stay negated indefinitely. if an external bus arbiter is designed to wait for the m68040 to assert bb before proceeding, then the system experiences an extended period of time in which bus arbitration is locked. motorola recommends that an external bus arbiter not assume that there is a direct relationship between br and bb or br and bg signals. figure 7-32 illustrates an example of the processor requesting the bus from the external bus arbiter. during c1, the m68040 asserts br to request the bus from the arbiter, which negates the alternate bus master?s bg signal and grants the bus to the processor by asserting bg during c3. during c3, the alternate bus master completes its current access and relinquishes the bus by three-stating all bus signals. typically, the bb and tip signals f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 50 m68040 user? manual motorola require a pullup resistor to maintain a logic-one level between bus master tenures. the alternate bus master should negate these signals before three-stating to minimize rise time of the signals and ensure that the processor recognizes the correct level on the next bclk rising edge. at the end of c3, the processor recognizes the bus grant and bus idle conditions ( bg asserted and bb negated) and assumes ownership of the bus by asserting bb and immediately beginning a bus cycle during c4. during c6, the processor begins the second bus cycle for the misaligned operand and negates br since no other accesses are pending. during c7, the external bus arbiter grants the bus back to the alternate bus master that is waiting for the processor to relinquish the bus. the processor negates bb and tip before three-stating these and all other bus signals during c8. finally, the alternate bus master recognizes the bus grant and idle conditions at the end of c8 and is able to resume bus activity during c9. a31ea0 bclk d31ed0 transfer
attributes ts tip ta alternate
master processor br bg bb am_br am_bg alternate
master c1 c2 c3 c4 c5 c8 c9 c6 c7 * * * am indicates the alternate bus master. figure 7-32. processor bus request timing f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 51 figure 7-33 illustrates a functional timing diagram for an arbitration of a relinquish and retry operation. figure 7-34 is a functional timing diagram for implicit ownership of the bus. in figure 7-33, the processor read access that begins in c1 is terminated at the end of c2 with a retry request and bg negated, forcing the processor to relinquish the bus and allow the alternate master to access the bus. note that the processor reasserts br during c3 since the original access is pending again. after alternate bus master ownership, the bus is granted to the processor to allow it to retry the access beginning in c7. a31ea0 bclk d31ed0 transfer
attributes ts tip ta alternate
master processor br bg bb am_br am_bg c1 c2 c3 c4 c5 c8 c6 c7 tea r/w processor * * * am indicates the alternate bus master. figure 7-33. arbitration during relinquish and retry timing f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 52 m68040 user? manual motorola a31?0 bclk d31?0 transfer
attributes ts tip ta alternate
master processor bus
implicitly
owned bus owned
and active bus owned
and idle br bg bb am_br am_bg c1 c2 c3 c4 c5 c8 c9 c6 c7 * * * am indicates the alternate bus master. undefined figure 7-34. implicit bus ownership arbitration timing 7.8.2 bus arbitration examples the following paragraphs illustrate the behavior of the m68040 bus arbitration scheme and provide examples of how an external bus arbiter can be designed to keep the integrity of locked bus operations. the examples include the previously mentioned indeterminate and disregard request conditions. 7.8.2.1 dual m68040 fairness arbitration. the following state diagram illustrates a fairness algorithm using two mc68040s and assigning the least priority to the processor that owns the bus. if both processors keep their respective br signals asserted, bus ownership alternates between the two processors so that each processor can run at least one bus cycle during its tenure. each processor is allowed to own the bus without relinquishing it to maintain the integrity of locked transfers. this example also illustrates f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 53 how the locke signal can be used to end a locked sequence and to yield the bus one bus cycle earlier than is normally possible. figure 7-35 illustrates the state diagram of a hypothetical external arbiter design. bb l lock l locke* state a state b state c state d br1* v br1 l
lock l locke* bb l lock* v bb l
lock l locke bg1*, bg2 bg1, bg2* br2 l lock l locke
v br2 l lock* br2* v
br2 l lock l locke bb* bg1*, bg2 bg1, bg2* bb l lock* v bb l
lock l locke bb* br1 l lock l locke
v br1 l lock* because this example uses two mc68040s, 1 and 2 refer to the processor and its signals. *indicates the signal is asserted for that device. notes: 1.
2. bb l lock l locke* figure 7-35. dual m68040 fairness arbitration state diagram assuming that processor 1 currently owns the bus, the external arbiter is in state a. if processor 2 asserts br2 , then processor 1 behaves in one of three ways: 1. if processor 1 is currently in the middle of a nonlocked bus access, then the external arbiter proceeds to state b, in which bg1 is negated and bg2 is asserted. the external arbiter then proceeds to state c only when bb is negated, signifying the end of the bus cycle. 2. if processor 1 is currently in the middle of a locked bus access, then the external arbiter stays in state a until locke is asserted. once locke is asserted, the external arbiter enters state b, in which bg1 is negated and bg2 is asserted. the external arbiter proceeds to state c once bb is negated, signifying the end of the bus cycle. 3. if processor 1 is in one of the three boundary conditions, then the external arbiter proceeds to state b. during state b, the external arbiter checks for the possibility of a newly initiated locked bus access. if it detects a locked bus cycle, it returns the bus to processor 1 by entering state a. note that even though processor 1 recognizes bg1 is asserted, it does not take the bus because processor 1 asserts bb whenever the boundary condition results in processor 1 performing another bus cycle. the external arbiter stays in state a until locke is asserted, then proceeds to state b to f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 54 m68040 user? manual motorola give the bus to processor 2. the arbiter remains in state b until bb is negated, signifying the end of the bus cycle. once state c is reached, depending on whether or not processor 2 asserts br2 and then negates br2 because of a disregard request condition, processor 1 may or may not actively begin a bus cycle. if no other bus requests are pending by the time state c is reached, processor 2 is in the implicit ownership state. if processor 1 asserts br1 , then it is possible for state c to persist for only one clock. in this case, processor 2 does not have a chance to run any active bus cycles. a null bus cycle tenure is better than having the external bus arbiter wait for processor 2 to perform at least one bus cycle before returning bus ownership to processor 1, even though this appears to be a waste of bus arbitration overhead. note that once processor 2 enters the disregard request condition, processor 2 reasserts br anywhere from one clock to an undetermined number of clocks before running another bus cycle. waiting for processor 2 to run a bus cycle can result in a temporary bus arbitration lockup. this bus arbitration scheme is restricted if the system supports the relinquish and retry operation that can occur for the last write cycle of a locked transfer. in this case, locke cannot be used. assuming that locke is always negated excludes the need for locke in an arbitration similar to this example. the reason for this restriction is that the external bus arbiter gives up the bus to the other processor once locke is asserted. if a relinquish and retry operation were to occur, then the next bus cycle would be from the other processor violating the integrity of the locked transfer. 7.8.2.2 dual m68040 prioritized arbitration. this example is very similar to the dual m68040 fairness arbitration example, except that one processor is assigned higher priority over the other. processor 2 can own the bus only if there are no processor 1 pending requests. it is important to note that when the processor asserts the lock signal, it also asserts br1 . this implementation replaces lock with br because br is more demanding than using lock . only when processor 2 is in the middle of a locked operation does it have higher priority than processor 1. similar to the m68040 fairness arbitration example, the restriction on using locke applies to this example. figure 7-36 illustrates the state diagram for dual m68040 prioritized arbitration. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 55 bb l lock l locke* br1 l br2* bb l br2 state a state b state c state d bb l br2* br2 v br1* l
br2* bb l lock* v bb &
lock l locke bg1*, bg2 bg1, bg2* br2 l lock l locke
v br2 l lock* br2* v
br2 l lock l locke* bg1*, bg2 bg1, bg2* because this example uses two mc68040s, 1 or 2 refers to the processor and its signals. *indicates the signal is asserted for that device. notes: 1.
2. bb* bb* figure 7-36. dual m68040 prioritized arbitration state diagram 7.8.2.3 m68040 synchronous dma arbitration. figure 7-37 illustrates a system with an m68040 and a synchronous direct memory access (dma) that contains an m68040 interface. figure 7-37(a) illustrates that the dma owning the bus only when the m68040 has no pending requests, and figure 7-37(b) illustrates the dma having higher priority than the m68040 causing the m68040 to yield the bus to the dma at any time except when the m68040 is performing a locked bus operation. in either case, the m68040 is the default bus master; if there are no pending requests from either device, the external arbiter gives the bus to the m68040. similar to the m68040 fairness arbitration example, the restriction on using locke applies to this example. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 56 m68040 user?s manual motorola 040_br state a state b state c state d bb bb l 040_br* am_bg, 040_bg* 040_br v am_br* (a) mc68040 high priorty, default bus master am_bg, 040_bg* am_bg*, 040_bg am_bg*, 040_bg 040_br am_br, 040_bg* bb l 040_br 040_br* state a state b state c state d bb l am_br* bb l lock* v bb
l lock l locke am_bg, 040_bg* am_br* v am_br l
lock l locke* (b) mc68040 low-priorty, default bus master am_bg, 040_bg* am_bg*, 040_bg am_bg*, 040_bg 040_br bb l lock l locke* bb l am_br am_br l lock l locke v
am_br l lock* * indicates the signal is asserted for that device. bb* bb* bb* bb* figure 7-37. m68040 synchronous dma arbitration f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 57 7.8.2.4 m68040 asynchronous dma arbitration. figure 7-38 illustrates a sample synchronizer circuit. figure 7-39 illustrates how an m68040 can be implemented to simulate an mc68030. the synchronizer circuit has an output indicating whether or not a signal has been asserted for at least two consecutive rising edges of bclk. if the synchronizer circuit indicates that the input has not been stable for at least two clocks, then the processor and alternate bus master stay in the current state. figure 7-37(a) duplicates the mc68030 implementation of the bus arbitration circuitry in which the m68040 is allowed to yield the bus only after the indeterminate condition has been eliminated. figure 7-37(b) is similar to the mc68030 implementation except that the dma device has lower priority and can only perform transfers when the m68040 is in the idle state. in either case, the m68040 is the default bus master; therefore, if there are no pending requests from either device, the external bus arbiter gives the bus to the m68040. abr clk rv r abgack clk av a figure 7-38. sample synchronizer circuit f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 58 m68040 user?s manual motorola am_bg*,
040_bg 040_bg*,
am_bg* r l rv l a* l
av v av* v rv* r* v rv* v
r l rv l lock l
locke* r l rv l lock l
locke v r l rv l
lock* lock l locke* locke
v lock* r* l rv l a* l
av v a l av s1 s2 s3 s4 s5 s6 r* l rv l a l av
v rv* v ra* r* l rv l a* l av r l rv rv* v ra* v
r l rv l a l av r l rv l a* l av r* l rv (a) mc68040 low-priorty, default bus master 040_bg*,
am_bg 040_bg*,
am_bg 040_bg*,
am_bg* 040_bg*,
am_bg* 040_bg*,
am_bg 040_bg*,
am_bg am_bg*,
040_bg 040_bg*,
am_bg* r* l rv l a* l av
v r l rv l 040_br r* v rv* v 040_br r l rv l 040_br 040_br r* l rv l a* l av
v a l av s1 s2 s3 s4 s5 s6 r l rv l 040_br r l rv l a* l av r* l rv 1. it is assumed that the asynchronous device takes the bus only after tip or the mc68040's bb is negated. 040_bg*,
am_bg 040_bg*,
am_bg 040_bg*,
am_bg* 040_bg*,
am_bg* 040_bg*,
am_bg 040_bg*,
am_bg 040_br r* l rv l a* l av
v av* v rv* (b) mc68040 high-priorty, default bus master rv* v ra* v
r l rv l a l av r* l rv l a l av
v rv* v ra* notes: 2. *indicates the signal is asserted for that device. figure 7-39. m68040 asynchronous dma arbitration f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 59 7.9 bus snooping operation when required, the m68040 can monitor alternate bus master transfers and intervene in the access to maintain cache coherency. the encoding of the scx signals generated by the alternate bus master for each bus cycle controls the process of bus monitoring and intervention called snooping. only byte, word, long-word, and line bus transfers can be snooped. refer to section 4 instruction and data caches for scx encodings. when the m68040 recognizes that an alternate bus master has asserted ts , the processor latches the level on the byte offset, sizx, tmx, and r/ w signals during the rising edge of bclk for which ts is first asserted. the processor then evaluates the scx and ttx signals to determine the type of access (ttx = $0 or $1), if it is snoopable, and, if so, how it should be snooped. if snooping is enabled for the access, the processor inhibits memory from responding by continuing to assert the memory inhibit signal ( mi ) while checking the internal caches for matching lines. during the snooped bus cycle, the m68040 ignores all ta assertions while mi is asserted. unless the data cache contains a dirty line corresponding to the access and the requested snoop operation indicates sink data for a write or source data for a read, mi is negated, and memory is allowed to respond and complete the access. otherwise, the processor continues to intervene in the access by keeping mi asserted and responding to the alternate bus master as a slave device. the processor monitors the levels of ta , tea , and tbi to detect normal, bus error, retry, and burst-inhibited terminations. note that for alternate bus master burst-inhibited line transfers, the m68040 snoops each of the four resulting long-word transfers. if snooping is disabled, mi is negated, and the m68040 counts the appropriate number of ta or tea assertions before proceeding. for example, if the sizx signals are pulled high, the m68040 requires four ta assertions, one tea assertion, or one retry termination before proceeding. as a bus master, the m68040 can be configured to request snooping operations on a page-by-page basis. the upax signals are connected to the scx inputs of the snooping processors. appropriately programming the user attribute bits in the corresponding page descriptor selects the required snooping operation for a page. refer to section 3 memory management unit (except mc68ec040 and mc68ec040v) for details on configuring the caching mode and user attribute bits for each memory page for the m68040 and mc68lc040, and refer to appendix b mc68ec040 for the mc68ec040. in a system with multiple bus masters, the memory unit must wait for each snooping bus master to negate mi before responding to an access. a termination signal asserted before the negation of mi leads to undefined operation and must be avoided at all costs. also, if the system contains multiple caching masters, then each master must access shared data using write-through pages that allow writes to the data to be snooped by other masters. the copyback caching mode is typically used for data local to a processor because in a multimaster caching system only one master at a time can access a given page of copyback data. the copyback caching mode also prevents multiple snooping processors from intervening in a specific access. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 60 m68040 user? manual motorola 7.9.1 snoop-inhibited cycle for alternate bus master accesses in which the scx signal encodings indicate that snooping is inhibited (scx = $0), the m68040 immediately negates mi and allows memory to respond to the access. snoop-inhibited alternate bus master accesses do not affect performance of the processor since no cache lookups are required. figure 7-40 illustrates an example of a snoop-inhibited operation in which an alternate bus master is granted the bus for an access. no matter what the values are on the scx and ttx signals, mi is asserted between bus cycles. because mi is asserted while a cache lookup is performed, snooping inherently degrades system performance. mi is asserted from the last ta of the current bus cycle if the m68040 owns the bus and loses it (see figure 7-40). if an alternate bus master has the bus and loses it, there are two different resulting cases. usually, an idle clock occurs between the alternate bus master? cycle and the mc68040? cycle. if so, mi is asserted during the idle clock and negated from the same edge that the m68040 asserts the ts signal (see figure 7-40). if there is no idle clock, mi is not asserted. mi is asserted during and after reset until the first bus cycle of the m68040. even though snoop is inhibited, all ta or tea assertions while mi is asserted are ignored. if a line snoop is started, the m68040 still requires four ta assertions. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 61 a31?0 bclk d31?0 ts ta alternate
master processor br bg bb am_br am_bg processor sc1, sc0 siz1, siz0 tt1, tt0 r/w mi c1 c2 c3 c1 c2 c3 undefined * * * am indicates the alternate bus master. figure 7-40. snoop-inhibited bus cycle 7.9.2 snoop-enabled cycle (no intervention required) for alternate bus master accesses in which scx = $1 or $2, indicating that snooping is enabled, the m68040 continues to assert mi while checking for a matching cache line. if intervention in the alternate bus master access is not required, mi is then negated, and memory is allowed to respond and complete the access. figure 7-41 illustrates an example of snooping in which memory is allowed to respond. best-case timing is f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 62 m68040 user? manual motorola illustrated, which results in a memory access having the equivalent of two wait states. variations in the timing required by snooping logic to access the caches can delay the negation of mi by up to two additional clocks. external logic must ensure that the termination signals negate at all rising bclk edges in which mi is asserted. otherwise, if one of the termination signals is asserted, either the m68040 ignores all termination signals, reading them as negated, or the m68040 exhibits improper operation. a31ea0 bclk d31ed0 ts ta alternate
master br bg bb am_br am_bg processor sc1esc0 siz1, siz0 tt1, tt0 r/w mi c1 c2 c3 c4 c5 c6 undefined * * am indicates the alternate bus master. * figure 7- 41. snoop access with memory response f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 63 7.9.3 snoop read cycle (intervention required) if snooping is enabled for a read access and the corresponding data cache line contains dirty data, the m68040 inhibits memory and responds to the access as a slave device to supply the requested read data. intervention in a byte, word, or long-word access is independent of which long-word entry in the cache line is dirty. figure 7-42 illustrates an alternate bus master line read that hits a dirty line in the m68040 data cache. the processor asserts ta to acknowledge the transfer of data to the alternate bus master, and the data bus is driven with the four long words of data for the line. the timing illustrated is for a best-case response time. variations in the timing required by snooping logic to access the caches can delay the assertion of ta by up to two additional clocks. 7.9.4 snoop write cycle (intervention required) if snooping with sink data is enabled for a byte, word, or long-word write access and the corresponding data cache line contains dirty data, the m68040 inhibits memory and responds to the access as a slave device to read the data from the bus and update the data cache line. the dirty bit is set for the long word changed in the cache line. figure 7-43 illustrates a long-word write by an alternate bus master that hits a dirty line in the m68040 data cache. the processor asserts ta to acknowledge the transfer of data from the alternate master, and the processor reads the value on the data bus. the timing illustrated is for a best-case response time. variations in the timing required by snooping logic to access the caches can delay the assertion of ta by up to two additional clocks. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 64 m68040 user? manual motorola a31?0 bclk d31?0 ts ta alternate master
line read br bg bb am_br am_bg processor sc1, sc0 siz1, siz0 tt1, tt0 r/w mi c1 c2 c3 c4 c5 c6 c7 c8 c9 ta and data driven by processor memory inhibited from responding * * * am indicates the alternate bus master. figure 7-42. snooped line read, memory inhibited f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 65 a31?0 bclk d31?0 ts ta alternate master
long-word write br bg bb processor sc1, sc0 siz1, siz0 tt1, tt0 r/w mi c1 c2 c3 c4 c5 c6 ta driven by processor memory inhibited from responding data written by alternate bus master am_br am_bg * * * am indicates the alternate bus master. figure 7-43. snooped long-word write, memory inhibited 7.10 reset operation an external device asserts the reset input signal ( rsti ) to reset the processor. when power is applied to the system, external circuitry should assert rsti for a minimum of 10 bclk cycles after v cc is within tolerance. figure 7-44 is a functional timing diagram of the power-on reset operation, illustrating the relationships among v cc , rsti , mode selects, and bus signals. the bclk and pclk clock signals are required to be stable by the time v cc reaches the minimum operating specification. the v ih levels of the clocks f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 66 m68040 user? manual motorola should not exceed v cc while it is ramping up. rsti is internally synchronized for two bclks before being used and must meet the specified setup and hold times to bclk (specifications #51 and #52 in section 11 mc68040 electrical and thermal characteristics ) only if recognition by a specific bclk rising edge is required. mi is asserted while the m68040 is in reset. bclk bus
signals +5 v 0 v rsti ts br cdis, mdis, ipl2?pl0 bg bb tip v cc undefined t 10
clocks 2
clocks 128
clocks > mi figure 7-44. initial power-on reset timing once rsti negates, the processor is internally held in reset for another 128 clock cycles. during the reset period, all signals that can be, are three-stated, and the rest are driven to their inactive state. once the internal reset signal negates, all bus signals continue to remain in a high-impedance state until the processor is granted the bus. afterwards, the first bus cycle for reset exception processing begins. in figure 7-44 the processor assumes implicit bus ownership before the first bus cycle begins. the levels on cdis , mdis , and ipl2eipl0 are used to selectively enable the special modes of operation when rsti is negated. these signals should be driven to their normal levels before the end of the 128-clock internal reset period. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 67 for processor resets after the initial power-on reset, rsti should be asserted for at least 10 clock periods. figure 7-45 illustrates timings associated with a reset when the processor is executing bus cycles. note that bb and tip (and ta if driven during a snooped access) are negated before transitioning to a three-state level. bclk bus
signals rsti ts br cdis, mdis, ipl2?pl0 bg bb tip t 10
clocks 2
clocks 128
clocks > mi figure 7-45. normal reset timing resetting the processor causes any bus cycle in progress to terminate as if ta or tea had been asserted. in addition, the processor initializes registers appropriately for a reset exception. section 8 exception processing describes exception processing. when a reset instruction is executed, the processor drives the reset out ( rsto ) signal for 512 bclk cycles. in this case, the processor resets the external devices of the system, and the internal registers of the processor are unaffected. the external devices connected to the rsto signal are reset at the completion of the reset instruction. an rsti signal that is asserted to the processor during execution of a reset instruction immediately resets the processor and causes the rsto signal to negate. rsto can be logically anded with the external signal driving rsti to derive a system reset signal that is asserted for both an external processor reset and execution of a reset instruction. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 68 m68040 user? manual motorola 7.11 special modes of operation the mc68lc040 and mc68ec040 do not support the following three modes of operation, which for the m68040 are selectively enabled during processor reset and remain in effect until the next processor reset. refer to appendix a mc68lc040 and appendix b mc68ec040 for differences in the special modes of operation for the mc68lc040 and mc68ec040. 7.11.1 output buffer impedance selection all output drivers in the m68040 can be configured to operate in either a large buffer mode (low-impedance driver) or small buffer mode (high-impedance driver). large buffers have a nominal output impedance of 6 w for both high and low drive, resulting in minimum output delays. signal traces driven by large buffers usually require transmission line effects to be considered in their design, including the use of signal termination. small buffers have a nominal impedance of 25 w for high and low drive, resulting in longer output delays and less critical board-design requirements. refer to section 11 mc68040 electrical and thermal characteristics for further information on electrical specifications, buffer characteristics, and transmission line design examples. the output drivers are configured in three groups. each group of signals is configured depending on the corresponding ipl? signal level during processor reset (see table 5-5). 7.11.2 multiplexed bus mode the multiplexed bus mode changes the timing of the three-state control logic for the address and data buses to support generation of a multiplexed address/data bus. when the m68040 is operating in this mode, the address and data bus signals can be hardwired together to form a single 32-bit bus, with address and data information time-multiplexed on the bus. this configuration minimizes the number of pins required to interface to peripheral devices without requiring additional discrete multiplexing logic. this mode is enabled during a processor reset by a logic zero on the cdis signal. figure 7-46 illustrates a line write with multiplexed bus mode enabled. the address bus drivers are enabled during c1 and disabled during c2. later in c2, the data bus drivers are enabled to drive the data bus with the data to be written. the address bus is only driven for the bclk rising edge at the start of each bus cycle. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 69 bclk siz1, siz0 tt1, tt0 tm2?m0 d31?0 upa1, upa0 ciout ts tip ta r/w c1 c2 c3 c4 c5 a31?0 tln1, tln0 note: the selected device increments the value of a3 and a2. 10 11 00 01 a1, a0 = figure 7-46. multiplexed address and data bus (line write) 7.11.3 data latch enable mode the data latch enable (dle) mode allows read data to be latched by the assertion of the dle signal instead of by the bclk rising edge at the end of each transfer. in some applications, this mode can reduce the number of clocks required to perform line burst reads. a logic zero on the mdis enables this mode during a processor reset. figure 7-47 illustrates a conceptual block diagram of the logic used to latch the read data bus in dle mode. the dle signal controls transparent latch a, which allows data to be latched before the rising edge of bclk. latch a operates transparently when dle is negated and latches the level on the data bus when dle is asserted. note that the dle signal only controls latching of the read data and does not affect termination of the bus f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
7- 70 m68040 user? manual motorola transfer. edge-triggered latch b is clocked by the rising edge of bclk and latches the data from latch a for use by internal logic. dq d q g transparent
latch - a edge-triggered
latch - b dle bclk ta, tea, tbi termination
control latched
read data write data external
data bus figure 7-47. dle mode block diagram figure 7-48 illustrates the data read timing for both normal operation and dle mode. during normal operation (i.e., dle mode disabled), latch a is always transparent, and by the rising edge of bclk, read data is latched. data must meet setup and hold time specifications #15 and #16 in this case. when the dle mode is enabled, the data can be latched by the rising edge of bclk or the falling edge of dle, depending on the timing for dle. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 7- 71 36 bclk d0?31 in
(read) dle ta dle mode data bus timing bclk d0?31 in
(read) ta 15 16 normal data bus timing case 1 case 2 32 33 34 31 36 37 35 figure 7-48. dle versus normal data read timing case 1 if dle is negated and meets setup time specification #35 to the rising edge of bclk when the bus read is terminated, latch a is transparent, and the read data must meet setup and hold time specifications #36 and #37 to the rising edge of bclk. read timing is similar to normal timing for this case. case 2 if dle is asserted, the data bus levels are latched and held internally. d31?0 must meet setup and hold time specifications #32 and #33 to the falling edge of dle, and can transition to a new level once dle is asserted. d31?0 must still meet setup time specification #36 to bclk, but not hold time specification #37, since the data is internally held valid as long as dle remains asserted low. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 8- 1 section 8 exception processing exception processing is the activity performed by the processor in preparing to execute a special routine for any condition that causes an exception. in particular, exception processing does not include execution of the routine itself. this section describes the processing for each type of integer unit exception, exception priorities, the return from an exception, and bus fault recovery. this section also describes the formats of the exception stack frames. for details on floating-point exceptions refer to section 9 floating-point unit (mc68040 only) . note for the mc68040v, mc68lc040, mc68ec040, and mc68ec040v ignore all references to floating-point, including any instructions that begin with an ?? also, for the mc68ec040 and mc68ec040v ignore all references to the memory management unit (mmu) and the instructions pflush and ptest. the functionality of the mc68040 transparent translation register has been changed in the mc68ec040 and mc68ec040v to the access control registers (acr). refer to appendix a mc68lc040 and appendix b mc68ec040 for details. 8.1 exception processing overview exception processing is the transition from the normal processing of a program to the processing required for any special internal or external condition that preempts normal processing. external conditions that cause exceptions are interrupts from external devices, bus errors, and resets. internal conditions that cause exceptions are instructions, address errors, and tracing. for example, the trap, trapcc, ftrapcc, chk, rte, div, and fdiv instructions can generate exceptions as part of their normal execution. in addition, illegal instructions, unimplemented floating-point instructions and data types, and privilege violations cause exceptions. exception processing uses an exception vector table and an exception stack frame. the following paragraphs describe the vector table and a generalized exception stack frame. the m68040 uses a restart exception processing model to minimize interrupt and instruction latency and to reduce the size of the stack frame (compared to the frame required for a continuation model). exceptions are recognized at each instruction boundary in the execute stage of the integer pipeline and force later instructions that have not yet reached the execute stage to be aborted. instructions that cannot be interrupted, f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
8- 2 m68040 user? manual motorola such as those that generate locked bus transfers or access serialized pages, are allowed to complete before exception processing begins. exception processing occurs in four functional steps. however, all individual bus cycles associated with exception processing (vector acquisition, stacking, etc.) are not guaranteed to occur in the order in which they are described in this section. figure 8-1 illustrates a general flowchart for the steps taken by the processor during exception processing. during the first step, the processor makes an internal copy of the status register (sr). then the processor changes to the supervisor mode by setting the s-bit and inhibits tracing of the exception handler by clearing the trace enable (t1 and t0) bits in the sr. for the reset and interrupt exceptions, the processor also updates the interrupt priority mask in the sr. during the second step, the processor determines the vector number for the exception. for interrupts, the processor performs an interrupt acknowledge bus cycle to obtain the vector number. for all other exceptions, internal logic provides the vector number. this vector number is used in the last step to calculate the address of the exception vector. throughout this section, vector numbers are given in decimal notation. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 8- 3 exit fetch vector
number (double bus fault) execute exception
handler exit (double bus fault) (double bus fault) entry save contents
to stack frame
(see note) prefetch 4
long words otherwise
begin instruction
execution otherwise bus error bus error bus error or
address error halted state
(pst3?st0 = $5) otherwise save internal
copy of sr s 1
t1, t0 0
(see note) note: these blocks vary for reset and interrupt exceptions. figure 8-1. general exception processing flowchart f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
8- 4 m68040 user? manual motorola the third step is to save the current processor contents for all exceptions other than reset. the processor creates one of five exception stack frame formats on the active supervisor stack and fills it with information appropriate for the type of exception. other information can also be stacked, depending on which exception is being processed and the state of the processor prior to the exception. if the exception is an interrupt and the m -bit of the sr is set, the processor clears the m-bit and builds a second stack frame on the interrupt stack. figure 8-2 illustrates the general form of the exception stack frame. status register program counter format vector offset additional processor state information
(2 or 26 words, if needed) 15 12 0 sp figure 8-2. general form of exception stack frame the last step initiates execution of the exception handler. the processor multiplies the vector number by four to determine the exception vector offset. it adds the offset to the value stored in the vector base register (vbr) to obtain the memory address of the exception vector. next, the processor loads the program counter (pc) (and the interrupt stack pointer (isp) for the reset exception) from the exception vector table entry. after prefetching the first four long words to fill the instruction pipe, the processor resumes normal processing at the address in the pc. when the processor executes an rte instruction, it examines the stack frame on top of the active supervisor stack to determine if it is a valid frame and what type of context restoration it requires. all exception vectors are located in the supervisor address space and are accessed using data references. only the initial reset vector is fixed in the processor? memory map; once initialization is complete, there are no fixed assignments. since the vbr provides the base address of the exception vector table, the exception vector table can be located anywhere in memory; it can even be dynamically relocated for each task that an operating system executes. the m68040 supports a 1024-byte vector table containing 256 exception vectors (see table 8-1). motorola defines the first 64 vectors and reserves the other 192 vectors for user-defined interrupt vectors. external devices can use vectors reserved for internal purposes at the discretion of the system designer. external devices can also supply vector numbers for some exceptions. external devices that cannot supply vector numbers use the autovector capability, which allows the m68040 to automatically generate a vector number. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 8- 5 table 8-1. exception vector assignments vector number(s) vector offset (hex) assignment 0 1 2 3 000 004 008 00c reset initial interrupt stack pointer reset initial program counter access fault address error 4 5 6 7 010 014 018 01c illegal instruction integer divide by zero chk, chk2 instruction ftrapcc, trapcc, trapv instructions 8 9 10 11 020 024 028 02c privilege violation trace line 1010 emulator (unimplemented a-line opcode) line 1111 emulator (unimplemented f-line opcode) 12 13 14 15 030 034 038 03c (unassigned, reserved) defined for mc68020 and mc68030, not used by m68040 format error uninitialized interrupt 16?3 040?5c (unassigned, reserved) 24 25 26 27 060 064 068 06c spurious interrupt level 1 interrupt autovector level 2 interrupt autovector level 3 interrupt autovector 28 29 30 31 070 074 078 07c level 4 interrupt autovector level 5 interrupt autovector level 6 interrupt autovector level 7 interrupt autovector 32?7 080?bc trap #0?5 instruction vectors 48?5 0c0?dc floating-point exception vectors (see note) 56 57 58 0e0 0e4 0e8 defined for mc68030 and mc68851, not used by m68040 defined for mc68851, not used by m68040 defined for mc68851, not used by m68040 59?3 0ec?fc (unassigned, reserved) 64?55 100?fc user defined vectors (192) note: refer to section 9 floating-point unit (mc68040 only) . 8.2 integer unit exceptions the following paragraphs describe the external interrupt exceptions and the different types of exceptions generated internally by the m68040 integer unit. the following exceptions are discussed: access fault address error instruction trap illegal and unimplemented instructions privilege violation f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
8- 6 m68040 user? manual motorola trace format error breakpoint instruction interrupt reset 8.2.1 access fault exception an access fault exception occurs when a data or instruction prefetch access faults due to either an external bus error or an internal access fault. both types of access faults are treated identically and the access fault exception handler or a status bit in the access fault stack frame distinguishes them. an access fault exception may or may not be taken immediately, depending on whether the faulted access specifically references data required by the execution unit or whether there are any other exceptions that can occur, allowing the execution pipeline to idle. an external access fault (bus error) occurs when external logic aborts a bus cycle and asserts the tea input signal. a bus error on a data write access always results in an access fault exception, causing the processor to begin exception processing immediately. a bus error on a data read also causes exception processing to begin immediately if the access is a byte, word, or long-word access or if the bus error occurs on the first transfer of a line read. bus errors on the second, third, or fourth transfers for a data line read cause the transfer to be aborted, but result in a bus error only if the execution unit is specifically requesting the long word being transferred. for example, if a misaligned operand spans the first two long words in the line being read, a bus error on the second transfer causes an exception, but a bus error on the third or last transfer does not, unless the execution unit has generated another operand access that references data in these transfers. bus errors that occur during instruction prefetches are deferred until the processor attempts to use the information. for instance, if a bus error occurs while prefetching other instructions after a change-of-flow instruction (bra, jmp, jsr, trap#n, etc.), bra, jmp, jsr, trap#n execution of the new instruction flow clears the exception condition. this also applies to the not-taken branch for a conditional branch instruction, even though both sides of the branch are decoded. processor accesses for either data or instructions can result in internal access faults. internal access faults must be corrected to complete execution of the current context. four types of internal access faults can occur: 1. push transfer faults occur when the execution unit is idle, the integer unit pipeline is frozen, the instruction and data cache requests are cancelled (however, writes are not lost), and pending writes are stacked. 2. data access faults occur when the bus controller and the execution unit are idle. a data access fault freezes the pipeline and cancels any pending instruction cache accesses. pending writes are stacked because the data cache is deadlocked until stacking transfers are initiated. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 8- 7 3. instruction access faults occur when the pc section is deadlocked because of the faulted data or another prefetch is required, the copyback stage is empty, and the data cache and bus controller are idle. since instruction access faults are reset, they can be ignored. 4. an internal access fault also occurs when the data or instruction mmu detects that a successful address translation is not possible because the page is write protected, supervisor only, or nonresident. furthermore, when an address translation cache (atc) miss occurs, the processor searches the translation tables in memory for the mapping, and then retries the access. if a valid translation for the logical address is not available due to a problem encountered during the table search, an internal access fault occurs when the aborted access is retried. the problem encountered could be either an invalid descriptor or the assertion of the tea signal during a bus cycle used to access the translation tables. a miss in the atc causes the processor to automatically initiate a table search but does not cause an internal access fault unless one of the three previous conditions is encountered. however, this is not true if the memory management unit (mmu) is disabled. when an exception is detected, all parts of the execution unit either remain or are forced to idle, at which time the highest priority exception is taken. restarting the instruction or a user-defined supervisor cleanup exception handler routine regenerates lower priority exceptions on the return from exception handling. internal access faults and bus errors are reported after all other pending integer instructions complete execution. if an exception is generated during completion of the earlier instructions, the pending instruction fault is cleared, and the new exception is serviced first. the processor restarts the pending prefetch after completing exception handling for the earlier instructions and takes a bus error exception if the access faults again. for data access faults, the processor aborts current instruction execution. if a data access fault is detected, the processor waits for the current instruction prefetch bus cycle to complete, then begins exception processing immediately. as illustrated in figure 8-1, the processor begins exception processing for an access fault by making an internal copy of the current sr. the processor then enters the supervisor mode and clears t1 and t0. the processor generates exception vector number 2 for the access fault vector. it saves the vector offset, pc, and internal copy of the sr on the stack. the saved pc value is the logical address of the instruction executing at the time the fault was detected. this instruction is not necessarily the one that initiated the bus cycle since the processor overlaps execution of instructions. it also saves information to allow continuation after a fault during a movem instruction and to support other pending exceptions. the faulted address and pending write-back information is saved. the information saved on the stack is sufficient to identify the cause of the bus error, complete pending write-backs, and recover from the error. the exception handler must complete the pending write-backs. up to three write-backs can be pending for push errors and data access errors. if a bus error occurs during the exception processing for an access fault, address error, or reset or while the processor is loading internal state information from the stack during the execution of an rte instruction, a double bus fault occurs, and the processor enters the halted state as indicated by the pst3?st0 encoding $5. in this case, the processor f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
8- 8 m68040 user? manual motorola does not attempt to alter the current state of memory. only an external reset can restart a processor halted by a double bus fault. the supervisor stack has special requirements to ensure that exceptions can be stacked. the stack must be resident with correct protection in the direction of growth to ensure that exception stacking never has a bus error or internal access fault. memory pages allocated to the stack that are higher in memory than the current stack pointer can be nonresident since an rte or frestore instruction can check for residency and trap before restoring the state. a special case exists for systems that allow arbitration of the processor bus during locked transfer sequences. if the arbiter can signal a bus error of a locked translation table update due to an improperly broken lock, any pages touched by exception stack operations must have the u-bit set in the corresponding page descriptor to prevent the occurrence of the locked access during translation table searches. 8.2.2 address error exception an address error exception occurs when the processor attempts to prefetch an instruction from an odd address. this includes the case of a conditional branch instruction with an odd branch offset that is not taken. a prefetch bus cycle is not executed, and the processor begins exception processing after the currently executing instructions have completed. if the completion of these instructions generates another exception, the address error exception is deferred, and the new exception is serviced. after exception processing for the address error exception commences, the sequence is the same as an access fault exception, except that the vector number is 3 and the vector offset in the stack frame refers to the address error vector. the stack frame is generated containing the address of the instruction that caused the address error and the address itself (a0 is cleared). if an address error occurs during the exception processing for a bus error, address error, or reset, a double bus fault occurs. 8.2.3 instruction trap exception certain instructions are used to explicitly cause trap exceptions. the trap#n instruction always forces an exception and is useful for implementing system calls in user programs. the trapcc, ftrapcc, trapv, chk, and chk2 instructions force exceptions if the user program detects an error, which can be an arithmetic overflow or a subscript value that is out of bounds. the divs and divu instructions force exceptions if a division operation is attempted with a divisor of zero. as illustrated in figure 8-1, when a trap exception occurs, the processor internally copies the sr, enters the supervisor mode, and clears t1 and t0. the processor generates a vector number according to the instruction being executed. vector 5 is for divx, vector 6 is for chk and chk2, and vector 7 is for ftrapcc, trapcc, and trapv instructions. for the trap#n instruction, the vector number is 32 plus n. the stack frame saves the trap vector offset, the pc, and the internal copy of the sr on the supervisor stack. the saved value of the pc is the logical address of the instruction following the instruction that caused the trap. for all instruction traps other than trap#n, a pointer to the instruction f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 8- 9 that caused the trap is also saved. instruction execution resumes at the address in the exception vector after the required instruction is prefetched. 8.2.4 illegal instruction and unimplemented instruction exceptions an illegal instruction exception corresponds to vector number 4, and occurs when the processor attempts to execute an illegal instruction. an illegal instruction is an instruction that contains any bit pattern that does not correspond to the bit pattern of a valid m68040 instruction. an illegal instruction exception is also taken after a breakpoint acknowledge bus cycle is terminated, either by the assertion of the transfer acknowledge ( ta ) or the transfer error acknowledge ( tea ) signal. an illegal instruction exception can also be a movec instruction with an undefined register specification field in the first extension word. instruction word patterns with bits 15?2 equal to $a do not correspond to legal instructions for the m68040 and are treated as unimplemented instructions. $a word patterns are referred to as an unimplemented instruction with a-line opcodes. when the processor attempts to execute an unimplemented instruction with an a-line opcode, an exception is generated with vector number 10, permitting efficient emulation of unimplemented instructions. for instruction word patterns with bits 15?2 equal to $f refer to section 9 floating-point unit (mc68040 only) . exception processing for illegal and unimplemented instructions is similar to that for instruction traps. when the processor has identified an illegal or unimplemented instruction, it initiates exception processing instead of attempting to execute the instruction. the processor copies the sr, enters the supervisor mode, and clears t1 and t0, disabling further tracing. the processor generates the vector number, either 4 or 10, according to the exception type. the illegal or unimplemented instruction vector offset, current pc, and copy of the sr are saved on the supervisor stack, with the saved value of the pc being the address of the illegal or unimplemented instruction. instruction execution resumes at the address contained in the exception vector. it is the responsibility of the exception handling routine to adjust the stacked pc if the instruction is emulated in software or is to be skipped on return from the exception handler. 8.2.5 privilege violation exception to provide system security, some instructions are privileged. an attempt to execute one of the following privileged instructions while in the user mode causes a privilege violation exception: andi to sr fsave movec ptest cinv move from sr moves reset cpush move to sr ori to sr rte eori to sr move usp pflush stop frestore exception processing for privilege violations is similar to that for illegal instructions. when the processor identifies a privilege violation, it begins exception processing before f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
8- 10 m68040 user? manual motorola executing the instruction. as illustrated in figure 8-1, the processor copies the sr, enters the supervisor mode, and clears the trace bits. the processor generates vector number 8, saves the privilege violation vector offset, the current pc value, and the internal copy of the sr on the supervisor stack. the saved value of the pc is the logical address of the first word of the instruction that caused the privilege violation. instruction execution resumes after the required prefetches from the address in the privilege violation exception vector. 8.2.6 trace exception to aid in program development, the m68000 family includes an instruction-by-instruction tracing capability. the m68040 can be programmed to trace all instructions or only instructions that change program flow. in the trace mode, an instruction generates a trace exception after the instruction completes execution, allowing a debugging program to monitor execution of a program. in general terms, a trace exception is an extension to the function of any traced instruction. the execution of a traced instruction is not complete until trace exception processing is complete. if an instruction does not complete due to an access fault or address error exception, trace exception processing is deferred until after execution of the suspended instruction is resumed. if an interrupt is pending at the completion of an instruction, trace exception processing occurs before interrupt exception processing starts. if an instruction forces an exception as part of its normal execution, the forced exception processing occurs before the trace exception is processed. the t1 and t0 bits in the supervisor portion of the sr control tracing. the state of these bits when an instruction begins execution determines whether the instruction generates a trace exception after the instruction completes. t1 and t0 bit = $1 causes an instruction that forces a change of flow to take a trace exception. the following instructions cause a trace exception to be taken when trace on change of flow is enabled. andi to sr cas2 fbcc (taken) jmp moves rtd bcc (taken) cinv fdbcc (always) jsr nop rte bra cpush fmovem move to sr ori to sr rtr bsr dbcc (taken) frestore move usp pflush rts cas eori to sr fsave movec ptest stop instructions that increment the pc normally do not take the trace exception. this mode also includes sr manipulations because the processor must prefetch instruction words again to fill the pipeline any time an instruction that modifies the sr is executed. table 8-2 lists the different trace modes. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 8- 11 table 8-2. tracing control t1 t0 tracing function 0 0 no tracing 0 1 trace on change of flow 1 0 trace on instruction execution (any instruction) 1 1 undefined, reserved when the processor is in the trace mode and attempts to execute an illegal or unimplemented instruction, that instruction does not cause a trace exception since the instruction is not executed. this is of particular importance to an instruction emulation routine that performs the instruction function, adjusts the stacked pc to skip the unimplemented instruction, and returns. before returning, the trace bits of the sr on the stack should be checked. if tracing is enabled, the trace exception processing should also be emulated for the trace exception handler to account for the emulated instruction. trace exception processing starts at the end of normal processing for the traced instruction and before the start of the next instruction. as illustrated in figure 8-1, the processor makes an internal copy of the sr, and enters the supervisor mode. it also clears the t1 and t0 bits of the sr, disabling further tracing. the processor supplies vector number 9 for the trace exception and saves the trace exception vector offset, pc value, and the internal copy of the sr on the supervisor stack. the saved value of the pc is the logical address of the next instruction to be executed. instruction execution resumes after the required prefetches from the address in the trace exception vector. when the stop instruction is traced, the processor never enters the stopped condition. a stop instruction that begins execution with the trace bits equal to $3 forces a trace exception after it loads the sr. upon return from the trace exception handler, execution continues with the instruction following the stop instruction, and the processor never enters the stopped condition. 8.2.7 format error exception just as the processor checks for valid prefetched instructions, it also performs some checks of data values for control operations. the rte instruction checks the validity of the stack format code. for floating-point unit (fpu) state frames, the frestore instruction compares the internal version number of the processor to that contained in the state frame (refer to section 9 floating-point unit (mc68040 only) ). this check ensures that the processor can correctly interpret internal fpu state information from the state frame. if any of these checks determine that the format of the data is improper, the instruction generates a format error exception. this exception saves a stack frame, generates exception vector number 14, and continues execution at the address in the format exception vector. the stacked pc value is the logical address of the instruction that detected the format error. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
8- 12 m68040 user? manual motorola 8.2.8 breakpoint instruction exception to use the m68040 in a hardware emulator, the processor must provide a means of inserting breakpoints in the emulator code and performing appropriate operations at each breakpoint. inserting an illegal instruction at the breakpoint and detecting the illegal instruction exception from its vector location can achieve this. however, since the vbr allows arbitrary relocation of exception vectors, the exception address cannot reliably identify a breakpoint. consequently, the processor provides a breakpoint capability with a set of breakpoint exceptions, $4848?484f. when the m68040 executes a breakpoint instruction, it performs a breakpoint acknowledge cycle (read cycle) with an acknowledge transfer type and transfer modifier value of $0. refer to section 7 bus operation for a description of the breakpoint acknowledge cycle. after external hardware terminates the bus cycle with either ta or tea , the processor performs illegal instruction exception processing. 8.2.9 interrupt exception when a peripheral device requires the services of the m68040 or is ready to send information that the processor requires, it can signal the processor to take an interrupt exception using the active-low ipl2 ipl0 signals. the three signals encode a value of 0? ( ipl0 is the least significant bit). high levels on all three signals correspond to no interrupt requested (level 0). values 1? specify one of seven levels of interrupts, with level 7 having the highest priority. table 8-3 lists the interrupt levels, the states of ipl2 ipl0 that define each level, and the sr interrupt mask value that allows an interrupt at each level. table 8-3. interrupt levels and mask values requested control line status interrupt mask level interrupt level ipl2 ipl1 ipl0 required for recognition 0 high high high no interrupt requested 1 high high low 0 2 high low high 0 1 3 high low low 0 2 4 low high high 0 3 5 low high low 0 4 6 low low high 0 5 7 low low low 0 7 when an interrupt request has a priority higher than the value in the interrupt priority mask of the sr (bits 10?), the processor makes the request a pending interrupt. priority level 7, the nonmaskable interrupt, is a special case. level 7 interrupts cannot be masked by the interrupt priority mask, and they are transition sensitive. the processor recognizes an interrupt request each time the external interrupt request level changes from some lower level to level 7, regardless of the value in the mask. figure 8-3 shows two examples of interrupt recognitions, one for level 6 and one for level 7. when the m68040 processes a f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 8- 13 level 6 interrupt, the sr mask is automatically updated with a value of 6 before entering the handler routine so that subsequent level 6 interrupts and lower level interrupts are masked. provided no instruction that lowers the mask value is executed, the external request can be lowered to level 3 and then raised back to level 6 and a second level 6 interrupt is not processed. however, if the m68040 is handling a level 7 interrupt (sr mask set to level 7) and the external request is lowered to level 3 and than raised back to level 7, a second level 7 interrupt is processed. the second level 7 interrupt is processed because the level 7 interrupt is transition sensitive. a level comparison also generates a level 7 interrupt if the request level and mask level are at 7 and the priority mask is then set to a lower level (with the move to sr or rte instruction, for example). the level 6 interrupt request and mask level example in figure 8-3 is the same as for all interrupt levels except 7. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
8- 14 m68040 user? manual motorola external
ipl2?pl0 interrupt priority
mask (i2?0) action level 6 example (initial conditions) 100 ($3) 101 ($5) (level comparison) if 001 ($6) then 110 ($6) and level 6 interrupt if 100 ($3) and still 110 ($6) then no action if 001 ($6) and still 110 ($6) then no action if still 001 ($6) and rte so that 101 ($5) then level 6 interrupt (level comparison) (transition) (transition) (initial conditions) 100 ($3) 101 ($5) if 000 ($7) then 111 ($7) and level 7 interrupt if 100 ($3) and still 111 ($7) then no action if 000 ($7) and still 111 ($7) then no action if still 000 ($7) and rte so that 101 ($5) then level 7 interrupt (level comparison) level 7 example figure 8-3. interrupt recognition examples note that a mask value of 6 and a mask value of 7 both inhibit request levels of 1? from being recognized. in addition, neither masks a transition to an interrupt request level of 7. the only difference between mask values of 6 and 7 occurs when the interrupt request level is 7 and the mask value is 7. if the mask value is lowered to 6, a second level 7 interrupt is recognized. external circuitry can chain or otherwise merge signals from devices at each level, allowing an unlimited number of devices to interrupt the processor. when several devices are connected to the same interrupt level, each device should hold its interrupt priority level constant until its corresponding interrupt acknowledge bus cycle ensures that all requests are processed. refer to section 7 bus operation for details on the interrupt acknowledge cycle. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 8- 15 figure 8-4 illustrates a flowchart for interrupt exception processing. when processing an interrupt exception, the processor first makes an internal copy of the sr, sets the mode to supervisor, suppresses tracing, and sets the processor interrupt mask level to the level of the interrupt being serviced. the processor attempts to obtain a vector number from the interrupting device using an interrupt acknowledge bus cycle with the interrupt level number output on the transfer modifier signals. for a device that cannot supply an interrupt vector, the autovector signal ( avec ) must be asserted. in this case, the m68040 uses an internally generated autovector, which is one of vector numbers 25?1, that corresponds to the interrupt level number (see table 8-1). if external logic indicates a bus error during the interrupt acknowledge cycle, the interrupt is considered spurious, and the processor generates the spurious interrupt vector number, 24. once the vector number is obtained, the processor saves the exception vector offset, pc value, and the internal copy of the sr on the active supervisor stack. the saved value of the pc is the logical address of the instruction that would have been executed had the interrupt not occurred. if the m-bit of the sr is set, the processor clears the m-bit and creates a throwaway exception stack frame on top of the interrupt stack as part of interrupt exception processing. this second frame contains the same pc value and vector offset as the frame created on top of the master stack, but has a format number of $1. the copy of the sr saved on the throwaway frame has the s-bit set, the m-bit clear, and the interrupt mask level set to the new interrupt level. it may or may not be set in the copy saved on the master stack. the resulting sr (after exception processing) has the s-bit set and the m-bit cleared. the processor loads the address in the exception vector into the pc, and normal instruction execution resumes after the required prefetches for the interrupt handler routine. most m68000 family peripherals use programmable interrupt vector numbers as part of the interrupt acknowledge operation for the system. if this vector number is not initialized after reset and the peripheral must acknowledge an interrupt request, the peripheral usually returns the vector number for the uninitialized interrupt vector, 15. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
8- 16 m68040 user? manual motorola exit fetch vector
from interrupting
device prefetch four
long words vector ? pc otherwise
begin instruction
execution bus error bus error or
address error halted state
(pst3?st0 = $5) entry exit save internal
copy of sr s
t1, t0
i2?0 1
00
level of
interupt
=
=
= autovector 25?1 spurious interrupt
vector #24 bus error if no vector # otherwise if m = 0
then vector offset,
pc, and sr ? active
stack frame m ? 0; vector
offset, pc, and sr
? throwaway
stack frame on isp (double bus fault) figure 8-4. interrupt exception processing flowchart f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 8- 17 8.2.10 reset exception asserting the reset in ( rsti ) input signal causes a reset exception. the reset exception has the highest priority of any exception; it provides for system initialization and recovery from catastrophic failure. reset also aborts any processing in progress when rsti is recognized; processing cannot be recovered. figure 8-5 is a flowchart of the reset exception processing. the reset exception places the processor in the interrupt mode of the supervisor privilege mode by setting the s-bit and clearing the m-bit and disables tracing by clearing the t1 and t0 bits in the sr. this exception also sets the processor? interrupt priority mask in the sr to the highest level, level 7. next the vbr is initialized to zero ($00000000), and the enable bits in the cache control register (cacr) for the on-chip caches are cleared. the reset exception also clears the enable bit but does not affect page size in the translation control registers. it clears the enable bit in each of the four transparent translation registers. an interrupt acknowledge bus cycle is begun to generate a vector number. this vector number references the reset exception vector (two long words, vector numbers 0 and 1) at offset zero in the supervisor address space. the first long word is loaded into the interrupt stack pointer, and the second long word is loaded into the pc. reset exception processing concludes with the prefetch of the first four long words beginning at the memory location pointed to by the pc. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
8- 18 m68040 user? manual motorola exit fetch vector #0 fetch vector #1 prefetch 4
long words (double bus fault) sp vector #0 vector #1 s
m
t1, t0
i2:i0
vbr
cacr
dttn[e-bit]
ittn[e-bit] 1
0
0
$7
$0
$0
0
0 =
=
=
=
=
=
=
= exit (double bus fault) (double bus fault) entry otherwise
begin instruction
execution otherwise bus error bus error bus error or
address error halted state
(pst3?st0 = $5) otherwise pc figure 8-5. reset exception processing flowchart after the initial instruction is prefetched, program execution begins at the address in the pc. the reset exception does not flush the atcs or invalidate entries in the instruction or data caches; it does not save the value of either the pc or the sr. if an access fault or address error occurs during the exception processing sequence for a reset, a double bus fault is generated. the processor halts, and the processor status (pst3?st0) signals indicate $5. execution of the reset instruction does not cause a reset exception, or affect f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 8- 19 any internal registers, but it does cause the m68040 to assert the reset out ( rsto ) signal, resetting all external devices. 8.3 exception priorities when several exceptions occur simultaneously, they are processed according to a fixed priority. table 8-4 lists the exceptions, grouped by characteristics. each group has a priority, from 0?, with 0 as the highest priority. table 8-4. exception priority groups group/ priority exception and relative priority characteristics 0 reset aborts all processing (instruction or exception) and does not save old context. 1 data access error (atc fault or bus error) aborts current instructions; can have pending trace, floating- point post-instruction, or unimplemented floating-point instruction exceptions. 2 floating-point pre-instruction* exception processing begins before current floating-point instruction is executed. instruction is restarted on return from exception. 3 bkpt #n, chk, chk2, divide by zero, ftrapcc, rte, trap#n, trapv illegal instruction, unimplemented a- and f-line, privilege violation unimplemented floating-point instruction* exception processing is part of instruction execution. exception processing begins before instruction is executed. exception processing begins after memory operands are fetched and before instruction is executed. 4 floating-point post-instruction* only reported for fmove to memory. exception processing begins when fmove instruction and previous exception processing have completed. 5 address error reported after all previous instructions and associated exceptions have completed. 6 trace exception processing begins when current instruction or previous exception processing has completed. 7 instruction access error (atc fault or bus error) reported after all previous instructions and associated exceptions have completed. 8 interrupt exception processing begins when current instruction or previous exception processing has completed. * refer to section 9 floating-point unit (mc68040 only) for details concerning floating-point instructions. the method used to process exceptions in the m68040 is significantly different from that used in earlier members of the m68000 processor family due to the restart exception model. in general, when multiple exceptions are pending, the exception with the highest priority is processed first, and the remaining exceptions are regenerated when the current instruction restarts. note that the reset operation clears all other exceptions except in the following circumstances: as soon as the m68040 has completed exception processing for a condition when an interrupt exception is pending, it begins exception processing for the interrupt f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
8- 20 m68040 user? manual motorola exception instead of executing the exception handler for the original exception condition. for example, if simultaneous interrupt and trap exceptions are pending, the exception processing for the trap exception occurs first, followed immediately by exception processing for the interrupt. when the processor resumes normal instruction execution, it is in the interrupt handler, which returns to the trap exception handler. exception processing for access error exceptions creates a format $7 stack frame that contains status information that can indicate a pending trace, floating-point post- instruction, or unimplemented floating-point instruction exception. the rte instruction used to return from the access error exception handler checks the status bits for one of these pending exceptions. if one is indicated, the rte changes the access error stack frame to match the pending exception and fetches the vector for the exception. instruction execution then resumes in the new exception handler. if an access error, trace, and one of the two (mutually exclusive) floating-point exceptions occur simultaneously, the pending floating-point exception is indicated in the access error stack and the trace exception flag is undefined. the exception handler for the floating-point exception must check the trace bits on the stack and call the trace handler directly (after adjusting the stack frame to match the format for the trace exception). if a trace exception is pending at the same time an exception priority level 3 or floating-point post-instruction exception is pending, the trace exception is not reported, and the exception handler for the other exception condition must check for the trace condition. 8.4 return from exceptions after the processor has completed executing the exception handlers for all pending exceptions, the processor resumes normal instruction execution at the address in the processor? vector table for the last exception processed. once the exception handler has completed execution, if possible the processor must return the system context as it was prior to the exception using the rte instruction. (if the internal data of the exception stack frames are manipulated, m68040 may enter into an undefined state; this applies specifically to the ssw on the access error stack frame.) when the processor executes an rte instruction, it examines the stack frame on top of the active supervisor stack to determine if it is a valid frame and what type of context restoration it requires. if during restoration, a stack frame has an odd address pc and an sr that indicates user trace mode enabled, then an address error is taken. the sr stacked for the address error has the sr s-bit set. for previous members of the m68000 family the s-bit is clear. when the m68040 writes or reads a stack frame, it uses long- word operand transfers wherever possible. using a long-word-aligned stack pointer greatly enhances exception processing performance. the processor does not necessarily read or write the stack frame data in sequential order. the system software should not depend on a particular exception generating a particular stack frame. for compatibility with future devices, the software should be able to handle any format of stack frame for any type of exception. the following paragraphs discuss in detail each stack frame format. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 8- 21 8.4.1 four-word stack frame (format $0) if a four-word stack frame is on the active stack and an rte instruction is encountered, the processor updates the sr and pc with the data read from the stack, increments the stack pointer by eight, and resumes normal instruction execution. stack frames exception types stacked pc points to status register program counter 0 0 0 0 vector offset 0 15 sp +$02 +$06 four-word stack frame?ormat $0 interrupt format error trap #n illegal instruction a-line instruction f-line instruction privilege violation floating-point pre- instruction next instruction rte or restore instruction next instruction illegal instruction a-line instruction f-line instruction first word of instruction causing privilege violation floating-point pre- instruction exception 8.4.2 four-word throwaway stack frame (format $1) if a four-word throwaway stack frame is on the active stack and an rte instruction is encountered, the processor increments the active stack pointer by eight, updates the sr with the value read from the stack, and then begins rte processing again, as illustrated in figure 8-6. the processor reads a new format word from the stack frame on top of the active stack (which may or may not be the same stack used for the previous operation) and performs the proper operations corresponding to that format. in most cases, the throwaway frame is on the interrupt stack, and when the sr value is read from the stack, the s-bit and m-bit are set. in that case, there is a normal four-word frame on the master stack. however, the second frame can be any format (even another throwaway frame) and can reside on any of the three system stacks. stack frames exception types stacked pc points to status register program counter 0 0 0 1 vector offset 0 15 sp +$02 +$06 throwaway four-word stack frame?ormat $1 created on interrupt stack during interrupt exception processing when transition from master state to interrupt state occurs. next instruction: same as on master stack. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
8- 22 m68040 user? manual motorola exit other formats format code = $1 (throwaway
frame) sr temp
sp sp + 6 pc (sp) +
sp sp + 6
sr temp temp (sp)+ read format word entry invalid format
word otherwise otherwise format code = $0
(4-word frame) otherwise take format
error exception figure 8-6. flowchart of rte instruction for throwaway four-word frame 8.4.3 six-word stack frame (format $2) if a six-word throwaway stack frame is on the active stack and an rte instruction is encountered, the processor restores the sr and pc values from the stack, increments the active supervisor stack pointer by $c, and resumes normal instruction execution. stack frames exception types stacked pc points to status register program counter 0 0 1 0 vector offset 0 15 sp +$02 +$06 six-word stack frame?ormat $2 address +$08 chk, chk2, trapcc, ftrapcc, trapv, trace, or zero divide unimplemented floating - point instruction address error next instruction: address is the address of the instruction that caused the exception. next instruction: address is the calculated for the floating-point instruction. instruction that caused the address error, address is the reference address ?1. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 8- 23 8.4.4 floating-point post-instruction stack frame (format $3) the processor restores the sr and pc values from the stack and increments the active supervisor stack pointer by $c. if another floating-point post-instruction exception is pending, exception processing begins immediately for the new exception; otherwise, the processor resumes normal instruction execution. stack frames exception types stacked pc points to status register program counter 0 0 1 1 vector offset 0 15 sp +$02 +$06 floating-point post-instruction
stack frame?ormat $3 effective address +$08 floating-point post- instruction next instruction: is the calculated effective address for the floating-point instruction. 8.4.5 eight-word stack frame (format $4) the mc68040v, mc68lc040, mc68ec040, and mc68ec040v use this stack frame for unimplemented floating-point instructions. the mc68040 does not generate or recognize this format stack frame. refer to appendix a mc68lc040 and appendix b mc68ec040 for further details about this stack frame. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
8- 24 m68040 user? manual motorola 8.4.6 access error stack frame (format $7) a 30-word access error stack frame is created for data and instruction access faults other than instruction address errors. in addition to information about the current processor status and the faulted access, the stack frame also contains pending write-backs that the access error exception handler must complete. the following paragraphs describe in detail the format for this frame and how the processor uses it when returning from exception processing. stack frames exception types stacked pc points to special status word (ssw) $00 write-back 1 status (wb1s) 0 15 sp +$02 +$12 access error stack frame
(30 words)?ormat $7 fault address (fa) +$14 write-back 3 address (wb3a) write-back 3 data (wb3d) write-back 1 data/push data lw0 (wb1d/pd0) push data lw 1 (pd1) write-back 2 address (wb2a) write-back 2 data (wb2d) write-back 1 address (wb1a) push data lw 2 (pd2) push data lw 3 (pd3) +$18 +$1c +$20 +$24 +$28 +$2c +$30 +$34 +$38 status register program counter 0 1 1 1 vector offset effective address (ea) $00 write-back 2 status (wb2s) $00 write-back 3 status (wb3s) +$10 +$0c +$0e +$08 +$0a +$06 data or instruction access fault (atc fault or bus error) next instruction 8.4.6.1 effective address. the effective address contains address information when one of the continuation flags cm, ct, cu, or cp in the ssw is set. 8.4.6.2 special status word (ssw). the ssw information indicates whether an access to the instruction stream or the data stream (or both) caused the fault and contains status information for the faulted access. figure 8-7 illustrates the ssw format. 1514131211109876543210 cp cu ct cm ma atc lk rw x size tt tm figure 8-7. special status word format f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 8- 25 cp?ontinuation of floating-point post-instruction exception pending cp is set for an access error with a floating-point post-instruction exception pending. all pending accesses are allowed to complete after a trace condition is recognized. if any of these accesses fault, the resulting stack frame has the ct bit set, and the effective address field contains the address of the instruction being traced. the rte fetches the appropriate floating-point post-instruction exception vector. when a post-instruction exception occurs during tracing, the post-instruction exception takes precedence. cp is set, and ct = 0 and can be traced. the kernel must check for a trace condition using the stacked sr. the effective address field contains the calculated effective address determined by the effective address field of the floating- point instruction that caused the post-instruction exception. cu?ontinuation of unimplemented floating-point instruction exception pending cu is set for an access error with a pending exception for an unimplemented floating- point instruction. operation is the same as for the cp flag except the rte fetches the f-line exception vector. the effective address field contains the calculated effective address determined by the effective address field of the unimplemented instruction. when an unimplemented floating-point instruction is traced, the unimplemented exception takes precedence, cu is set, and ct = 0. the kernel must check for a trace condition using the stacked sr. if this condition is true, create the required stack frame and jump directly to the trace handler. ct?ontinuation of trace exception pending ct is set for an access error with a pending trace exception. operation is the same as for the cp flag. when rte is executed with ct set, the m68040 will move the words on the stack an offset of $00?0b from the current sp to offset $30?3b, adjusting the stack pointer by +$30. the m68040 changes the stack frame format to $2 before fetching the trace exception vector and jumping directly to trace exception handling. this stack adjustment creates the stack frame that normally would have been created for the trace exception had the pending access not encountered a bus error. cm?ontinuation of movem instruction execution pending cm is set if a data access encounters a bus error for a movem. since the movem operation can write over the memory location or registers used to calculate the effective address, the m68040 internally saves the effective address after calculation. when movem encounters a bus error, a stack frame is created with cm set, and the effective address field contains the calculated effective address for the instruction. when rte is executed, movem restarts using the effective address on the stack (instead of repeating the effective address calculate operation) if the address mode is pc relative (mode = 111, register = 010 or 011) or indirect with index (mode = 110). ma?isaligned access ma is set if an atc fault occurs for second-page access that spans two pages in memory. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
8- 26 m68040 user? manual motorola atc?tc fault this bit is set for an atc fault due to a nonresident entry (bus error during table search or invalid descriptor encountered) or privilege violation (write protected or supervisor only). it is cleared for a bus-errored instruction, data, or cache line-push access. lk?ocked transfer (read-modify-write) this bit is set if a fault occurred on a locked transfer; it is cleared otherwise. rw?ead/write this bit is set if a fault occurred on a read transfer; it is cleared otherwise. x?ndefined size?ransfer size the size field corresponds to the original access size. if a data cache line read results from a read miss and the line read encounters a bus error, the size field in the resulting stack frame indicates the size of the original read generated by the execution unit. tt?ransfer type this field defines the tt1?t0 signal encodings for the faulted transfer. tm?ransfer modifier this field defines the tm2?m0 signal encodings for the faulted transfer. 8.4.6.3 write-back status. these fields contain status information for the three possible write-backs that could be pending after the faulted access (see figure 8-8). for a data cache line-push fault or a move16 write fault, wb1s is zero (invalid). 76543210 v size tt tm tm?ransfer modifier tt?ransfer type size?ransfer size v?alid write (write-back pending if set) figure 8-8. write-back status format 8.4.6.4 fault address. the fault address (fa) is the initial address for the access that faulted. the fa is a physical address only for cache pushes and a logical address for all other cases. for a misaligned access that faults, the fa field contains the address of the first byte of the transfer, regardless of which of the two or three bus transfers for the misaligned access was faulted. for a push fault, the wb1a and fa addresses are the same. 8.4.6.5 write-back address and write-back data. write-back addresses (wb3a, wb2a, and wb1a) are memory pointers that indicate where to place the write- f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 8- 27 back data (wb3d, wb2d, and wb1d). wb3a and wb3d correspond to the temporary holding register in the integer unit (wb3). wb2a and wb2d correspond to the temporary holding register in the data memory unit (wb2) prior to address translation. wb1a and wb1d correspond to the temporary holding register in the bus controller (wb1), which determines the external address and data bus bit patterns. refer to section 2 integer unit for details on the operation of the integer unit pipeline. the write-back data in wb3d and wb2d is register aligned with byte and word data contained in the least significant byte and word, respectively, of the field. write-back data in wb1d is memory aligned and resides in the byte positions corresponding to the data bus lanes used in writing each byte to memory. table 8-5 lists the data alignment for each combination of data format and a1 and a0. table 8-5. write-back data alignment address data alignment data format a1 a0 wb1d wb2d, wb3d byte 0 0 1 1 0 1 0 1 31?4 23?6 15? 7? 7? 7? 7? 7? word 0 0 1 1 0 1 0 1 31?6 23? 15? 7?, 31?4 15? 15? 15? 15? long word 0 0 1 1 0 1 0 1 31? 23?, 31?4 15?, 31?6 7?, 31? 31? 31? 31? 31? note: for a line transfer fault, the four long words of data in pd3 pd0 are already aligned with memory. bits 31? of each field correspond to bits 31? of the memory location to be written to, regardless of the value of the address bits a1 and a0 for the write-back address. 8.4.6.6 push data. the push data field contains an image of the cache line that needs to be pushed to memory. 8.4.6.7 access error stack frame return from exception. for the access error stack frame (format $7), the processor restores the sr and pc values from the stack and checks the four continuation status bits in the ssw on the stack. if these bits are not set, the processor increments the active supervisor stack pointer by 30 words and resumes normal instruction execution. if the movem continuation bit is set, the processor restores the calculated effective address from the stack frame, increments the active supervisor stack pointer by 30 words, and restarts the movem instruction at a point after the effective address calculation. all operand accesses for the movem that occurred before the faulted access are repeated. if a continuation bit is set for a pending trace, unimplemented floating-point instruction, or floating-point post-instruction exception, the processor restores the calculated effective address from the stack frame, increments the active supervisor stack pointer by 30 words, and immediately begins exception processing f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
8- 28 m68040 user? manual motorola for the pending exception. the processor sets only one of the continuation bits when the access error stack frame is created. if the access error exception handler sets multiple bits, operation of the rte instruction is undefined. if the frame format field in the stack frame contains an illegal format code, a format exception occurs. if a format error or access fault exception occurs during the frame validation sequence of the rte instruction, the processor creates a normal four-word or an access error stack frame below the frame that it was attempting to use. the illegal stack frame remains intact, so that the exception handler can examine or repair the illegal frame. in a multiprocessor system, the illegal frame can be left so that, when appropriate, another processor of a different type can use it. the bus error exception handler can identify bus error exceptions due to instruction faults by examining the tm field in the ssw of the access error stack frame. for user and supervisor instruction faults, the tm field contains $2 and $6, respectively (see figure 8-7). since the processor allows all pending accesses to complete before reporting an instruction fault, the stack frame for an instruction fault will not contain any pending write- backs. the atc bit of the ssw is used to distinguish between atc faults and physical bus errors, and the fa field contains the logical address of the instruction prefetch. for atc faults, the exception handler can execute a ptest instruction (using the fa and tm fields from the ssw) to determine the specific cause of the address translation failure. after the handler corrects the cause of the fault, it executes an rte instruction to restart execution of the instruction that contained the faulted prefetch. for an address error fault, the processor saves a format $2 exception stack frame on the stack. this stack frame contains the pc pointing to the instruction that caused the address error as well as the actual address referenced by the instruction. note that bit 0 of the referenced address is cleared on the stack frame. address error faults must be repaired in software. for a fault due to a data atc fault or bus error, pending write-backs are also saved on the access error stack frame and must be completed by the exception handler. for the faulted access, the fault address in the fa field combined with the transfer attribute information from the ssw can be used to identify the cause of the fault. in identifying the fault, the system programmer should be aware that the data memory unit considers the read portion of read-modify-write transfers (for tas, cas, cas2, and some translation table updates) a write. this prevents both read and write accesses from occurring unless all pages touched by the instruction or table update are write enabled. all accesses other than instruction prefetches go through the data memory unit, and the m68040 treats the instruction and data address spaces as a single merged address space (the exception is the presence of separate transparent translation registers). the function codes for accesses such as pc relative operand addressing and moves transfers to function codes $2 and $6 (user and supervisor instruction spaces in the mc68000) are converted to data references to go through the data memory unit, and appear in the tm field of the access error stack frame as data references. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 8- 29 after the fault is corrected, any pending write-backs on the stack frame must be completed. the write-back status fields should be checked for possible write-backs, which the exception handler should complete in the following order: write-back 1, write-back 2, and write-back 3. for a push fault, the push must be completed first, followed by two potential write-backs. completion of write-back 1 should not generate another access error since this write-back corresponds to the faulted access that has been corrected by the handler. however, write-backs 2 and 3 can cause another bus error exception when the handler attempts to write to memory and should be checked before attempting the write to prevent nesting of exceptions if required by the operating system. the following general bus fault examples indicate the resulting contents of the access error stack frame fields: 1. all read access errors (ssw?w = $1, tt = $0, tm = $1 or $5)?he fa field contains the logical address of the fault. the wb1s and wb2s fields are zero, and only wb3s can indicate an additional write-back. 2. cache push physical bus error (ssw?w = $0, tt = $0, tm = $0)?he assertion of tea causes this error when a cache push bus cycle is in progress. the fa field contains the physical address of the fault, and the wb1s field is ignored. all four long words of the data for a push are contained in lw3?w0 regardless of the size of the transfer. the size of the transfer is indicated in the size field of the ssw and can be either a line or long word. if a line is indicated, all four long words need to be pushed out. if a long word is indicated, all four long words can be written out, or bits 3 and 2 of the fa field can be evaluated to indicate which long words need to be written out to memory ($3, $2, $1, and $0 indicate lw3, lw2, lw1, and lw0, respectively). the wb2s and wb3s fields indicate up to two additional write-backs. if wb2s is valid and if it indicates a move16 instruction, no data should be written out for that write-back slot. 3. normal write physical bus error (ssw?w = $0, tt = $0, tm = $1 or $5)?he assertion of tea causes this error when a normal write bus cycle is in progress. the fa field contains the logical address of the fault, and the wb1s field indicates that it is valid. the fa and wb1a are equivalent. the wb2s and wb3s fields indicate up to two additional write-backs. 4. move16 write physical bus error (ssw?w = $0, tt = $1)?he assertion of tea causes this error during the write portion of a move16 instruction. the fa field contains the logical address of the fault, and the wb1s field indicates that it is valid. all four long words are contained in lw3?w0 and must be written out before using fa. software must ensure that address bits 1 and 0 are both clear if regular move instruction are to be used to write out to the destination. 5. page fault (ssw?w = $0, wb1s? = $0)?he fa field contains the physical address of the faulted instruction, wb1s = 0, and wb2s indicates that it is valid. only wb3s can indicate an additional write-back. if wb2s indicates a move16 instruction and if the move16 instruction is used to read from a peripheral that cannot tolerate double reads, then software must write the data contained in pd3 pd0 out to memory and increment the stacked pc to take it beyond the move16 instruction that caused the page fault. otherwise, if the move16 instruction is allowed to be restarted, another read from the peripheral would occur. if double reads can be tolerated, simply do no write-backs and allow instruction to restart. this is the only case in which the action to be taken depends on whether or not a double read can be tolerated. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
8- 30 m68040 user? manual motorola table 8-6 lists the possible combinations of write-backs and the proper way to handle them. the ssw_rw column indicates a read or write cycle; the ssw_push column indicates whether the fault is for a push (tt = 00 and tm = 000). the wb1s, wb2s, and wb3s columns list the respective field? v-bit and indicate a move16 transfer type (tt = 01). the easy cleanup data written column lists the stack? field to be written out to memory if the user is not concerned with retouching peripherals. the hard cleanup action column lists the action to be taken if the peripherals cannot be retouched by move16 (if different from easy cleanup). note that if a push access error is reported and the size is long word, all four long words, pd0?d3, are still valid for the line. the exception handler can either write pd0?d3 using the fault address with bits 3? cleared or write the pd corresponding to bits 3? of the address (e.g., address $0000000c corresponds to pd3). note that a move16 is never reported in the wb3s. the size field of wb3s is never a line. after the bus error exception handler completes all pending operations and executes an rte to return, the rte reads only the stack information from offset $0?d in the access error stack frame. for a pending trace exception, unimplemented floating-point instruction exception, or floating-point post-instruction exception, the rte adjusts the stack to match the pending exception and immediately begins exception processing, without requiring the exception to reoccur. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 8- 31 table 8-6. access error stack frame combinations wb1s wb2s wb3s easy cleanup hard cleanup main case ssw_rw ssw_push 1v 1m16 2v 2m16 3v data written action all read access errors 1 a 1 a no no 0 0 x x 0 0 x x 0 1 none wb3d (note b) all other read cases are not possible. cache push physical bus error c 0 0 0 0 0 yes yes yes yes yes 0 0 0 0 0 x x x x x 0 0 1 1 1 x x 0 0 1 0 1 0 1 0 pd3? pd3?, wb3d pd3?, wb2d pd3?, wb2d, wb3d pd3?, ~wb2d d (note b) normal write physical bus error 0 0 0 0 0 no no no no no 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 x x 0 0 1 0 1 0 1 0 wb1d wb1d, wb3d wb1d, wb2d wb1d, wb2d, wb3d wb1d, ~wb2d d (note b) move16 write physical bus error 0 0 0 0 0 no no no no no 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 x x 0 0 1 1 0 0 1 0 pd3?, wb3d pd3? pd3?, wb2d pd3?, wb2d, wb3d pd3?, ~wb2d d (note b) write page fault 0 0 0 no no no 0 0 0 x x x 1 1 1 0 0 1 0 1 0 wb2d wb2d, wb3d ~wb2d d write pd3? and skip e . impossible write cases 0 0 yes don't care 1 x x x x x x 1 x 1 (note f) (note g) notes: a. the data memory unit stage is tied up until the bus controller passes the read back through the data memory unit and to the execution stage in the integer unit. therefore, no pending write is possible in wb1 or wb2. wb3 could hold a pending write that was deferred due to operand read or was generated after the read. b. if any kind of access error is reported and if a move16 write is pending in the wb2 stage, then that move16 read must hit in the cache so the move16 can be safely restarted since it has not caused bus cycles that could retouch peripherals. c. a cache push physical bus error is normally considered a fatal error. for these cases, the fa field is a physical address, not a logical address as in the other cases. d. indicates that the data should not be written even though the v-bit for it is set (wb2 corresponds to a move16 write). e. the exception handler must alter the stacked pc to point past the move16 and predecrement and postincrement address registers. f. 1v must be 0 for push exceptions. g. the execution stage does not post a write until the move16 is in the integer unit. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 9- 1 section 9 floating-point unit (mc68040 only) note this section does not apply to the mc68040v, mc68lc040, mc68ec040, or mc68ec040v. refer to appendix a mc68lc040 and appendix b mc68ec040 for details. floating-point math refers to numeric calculations with a variable decimal point location. it is distinguished from integer math, which deals only with whole numbers and fixed decimal point locations. historically, general-purpose microprocessors have had to depend on add-on coprocessors and accelerators such as the mc68881/mc68882 for fast floating-point capabilities. the mc68040 features a built-in floating-point unit (fpu). consolidating this important function on chip speeds up the overall processing and eliminates some interfacing overhead required for external accelerators. the mc68040 fpu operates in parallel with the integer unit (iu). the fpu does the numeric calculation while the iu moves on to other tasks. like the iu, the fpu has its own three-stage pipeline overlapping operations such as integer to floating-point conversion, instruction execution, and write-back. when used with the m68040fpsp, the mc68040 fpu is fully compliant with ieee floating-point standards. 9.1 floating-point unit pipeline integer data from memory (memory to register) requires a pass through the fpu pipeline, converting the data to the extended-precision format for the fpu to use. the result of this conversion is presented to the conversion stage of the fpu pipeline where the desired operation begins, starting a second pass through the pipeline. the iu is then released to execute other instructions once the data has been transferred to the fpu. floating-point data to memory (register to memory) requires a complete pass through the fpu pipeline, converting the data from the extended-precision format to an integer data format. register-to-memory instructions are normally handled entirely by the conversion stage of the pipeline where the data move to memory operation completes. the iu is not released until it has received the converted data (during the last conversion unit cycle). like the iu, the fpu has been optimized for the most frequently used instructions and data types to provide the highest possible performance. to boost performance further, the fmove instruction concurrently executes with arithmetic calculations and executes completely transparent to the user. instructions can execute nonsequentially as long as there are no register dependencies. refer to section 10 instruction timings for details on floating-point timings. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
9- 2 m68040 user? manual motorola the mc68040 fpu is compatible with the mc68881/mc68882. the mc68040 performs basic math functions such as floating-point addition and multiplication directly on dedicated circuitry and performs transcendental functions such as sine and cosine calculations by means of software routines. motorola offers the m68040fpsp, a software package providing these routines. the software functions are compatible with the mc68881/mc68882, refer to appendix e floating-point emulation (m68040fpsp) . 9.2 floating-point user programming model figure 9-1 illustrates the floating-point portion of the user programming model. the following paragraphs describe the fpu portion of the user programming model for the mc68040. the model, which is identical to the programming model for the mc68881/mc68882 floating-point coprocessors, consists of the following registers: eight 80-bit floating-point data registers (fp7?p0) 16-bit floating-point control register (fpcr) 32-bit floating-point status register (fpsr) 32-bit floating-point instruction address register (fpiar) 79 63 0 fp0 fp1 fp3 fp4 fp5 fp6 fp7 fp2 floating-point
data registers fpcr floating-point
control
register fpsr floating-point
status
register fpiar floating-point
instruction
address
register 0 7 15 31 mode
control exception
enable 0 exception
status condition
code quotient accrued
exception 0 7 15 31 23 0 31 figure 9-1. floating-point user programming model 9.2.1 floating-point data registers (fp7?p0) the floating-point data registers are analogous to the integer data registers of the m68000 family. the floating-point data registers always contain extended-precision numbers. all external operands, regardless of the data format, are converted to extended-precision values before being used in any calculation or stored in a floating-point data register. a f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 9- 3 reset or a restore operation of the null state sets fp7?p0 to positive, nonsignaling not-a- numbers (nans). 9.2.2 floating-point control register (fpcr) the fpcr (see figure 9-2) contains an exception enable (enable) byte that enables or disables traps for each class of floating-point exceptions and a mode control (mode) byte that sets the user-selectable modes. the user can read or write to the fpcr. motorola reserves bits 31?6 for future definition; these bits are always read as zero and are ignored during write operations. the reset function or a restore operation of the null state clears the fpcr. when cleared, this register provides the ieee 754 standard defaults. 9.2.2.1 exception enable byte. each bit of the enable byte (see figure 9-2) corresponds to a floating-point exception class. the user can separately enable traps for each class of floating-point exceptions. 9.2.2.2 mode control byte. the mode byte (see figure 9-2) controls the user- selectable rounding modes and precisions. zeros in this byte select the ieee 754 standard defaults. the rounding mode (rnd) specifies how inexact results are rounded, and the rounding precision (prec) selects the boundary for rounding the mantissa. the processor supports four rounding modes specified by the ieee 754 standard. these modes are: round to nearest (rn), round toward zero (rz), round toward plus infinity (rp), and round toward minus infinity (rm). the rp and rm modes are directed rounding modes that are useful in interval arithmetic. rounding is accomplished through the intermediate result. single-precision results are rounded to a 24-bit boundary; double- precision results are rounded to a 53-bit boundary; and extended-precision results are rounded to a 64-bit boundary. table 9-1 lists the encodings for the fpcr. table 9-1. floating-point control register encodings rounding mode (rnd field) encoding rounding precision (prec field) to nearest (rn) 0 0 extend (x) toward zero (rz) 0 1 single (s) toward minus infinity (rm) 1 0 double (d) toward plus infinity (rp) 1 1 undefined f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
9- 4 m68040 user? manual motorola 15 14 exception enable 12 11 10 9 8 inexact decimal input inexact operation divide by zero underflow overflow operand error signaling not-a-number
branch/set on unordered
7 65432 1 0 snan operr ovfl unfl dz inex2 inex1 bsun prec rnd 0 rounding precision rounding mode mode control 13 figure 9-2. floating-point control register 9.2.3 floating-point status register (fpsr) the fpsr (see figure 9-1) contains a floating-point condition code (fpcc) byte, a quotient byte, a floating-point exception status byte (exc), and a floating-point accrued exception byte (aexc). the user can read or write to all bits in the fpsr. execution of most floating-point instructions modifies this register. the reset function or a restore operation of the null state clears the fpsr. floating-point conditional operations are not guaranteed if the fpsr is written directly, because the fpsr is only valid as a result of a floating-point instruction. 9.2.3.1 floating-point condition code byte. the fpcc byte (see figure 9-3) contains four condition code bits that are set at the end of all arithmetic instructions involving the floating-point data registers. these bits are sign of mantissa (n), zero (z), infinity (i), and nan. the fmove fpm, < ea > , fmovem fpm, and fmove fpcr instructions do not affect the fpcc. n z i nan 31 30 29 28 27 26 25 24 not-a-number or unordered infinity zero negative 0 figure 9-3. fpsr condition code byte to aid programmers of floating-point subroutine libraries, the mc68040 implements the four fpcc bits in hardware instead of only implementing the four ieee conditions. an instruction derives the ieee conditions when needed. for example, the programmers of a complex arithmetic multiply subroutine usually prefer to handle special data types such as f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 9- 5 zeros, infinities, or nans separately from normal data types. the floating-point condition codes allow users to efficiently detect and handle these special values. 9.2.3.2 quotient byte. the quotient byte (see figure 9-4) provides compatibility with the mc68881/mc68882 fpu. this byte contains the seven least significant bits of the unsigned quotient as well as the sign of the entire quotient. the quotient bits can be used in argument reduction for transcendentals and other functions. for example, seven bits are more than enough to determine the quadrant of a circle in which an operand resides. the quotient field (bits 22?6) remains set until the user clears it. 23 22 21 20 19 18 17 16 seven least significant
bits of quotient s quotient sign of quotient figure 9-4. fpsr quotient byte 9.2.3.3 exception status byte. the exc byte (see figure 9-5) contains a bit for each floating-point exception that can occur during the most recent arithmetic instruction or move operation. the start of most operations clears this byte; however, operations that cannot generate floating-point exceptions do not clear this byte. an exception handler can use this byte to determine which floating-point exception(s) caused a trap. branch/set on
unordered snan operr ovfl unfl dz inex2 inex1 15 14 13 12 11 10 9 8 inexact decimal
input inexact operation divide by zero underflow overflow operand error signaling not-a-number bsun figure 9-5. fpsr exception status byte 9.2.3.4 accrued exception (aexc) byte. the aexc byte contains five exception bits (see figure 9-6) that the ieee 754 standard requires for exception disabled operations. these exceptions are logical combinations of the bits in the exc byte. the aexc byte contains the history of all floating-point exceptions that have occurred since the user last cleared the aexc byte. in normal operations, only the user clears this byte by writing to the fpsr; however, a reset or a restore operation of the null state can also clear the aexc byte. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
9- 6 m68040 user? manual motorola many users elect to disable traps for all or part of the floating-point exception classes. the aexc byte makes it unnecessary to poll the exc byte after each floating-point instruction. at the end of most operations (fmovem and fmove excluded), the bits in the exc byte are logically combined to form an aexc value that is logically ored into the existing aexc byte. this operation creates sticky floating-point exception bits in the aexc byte that the user needs to poll only once (i.e., at the end of a series of floating-point operations). a sticky bit is one that remains set until the user clears it. iop ovfl unfl dz inex 7 654321 0 inexact invalid operation divide by zero underflow overflow figure 9-6. fpsr accrued exception byte setting or clearing the aexc bits neither causes nor prevents an exception. the following equations show the comparative relationship between the exc byte and aexc byte. comparing the current value in the aexc bit with a combination of bits in the exc byte derives a new value in the corresponding aexc bit. these equations apply to setting the aexc bits at the end of each operation affecting the aexc byte: new aexc bit = old aexc bit v exc bits iop = iop v (snan v operr) ovfl = ovfl v (ovfl) unfl = unfl v (unfl l inex2) dz = dz v (dz) inex = inex v (inex1 v inex2 v ovfl) 9.2.4 floating-point instruction address register (fpiar) for the subset of the floating-point instructions that generate exception traps, the fpu loads the 32-bit fpiar with the logical address of the instruction before executing the instruction. because the iu can execute instructions while the fpu executes floating-point instructions and, the fpu can concurrently execute two floating-point instructions the pc value stacked by the mc68040 in response to a floating-point exception handler cannot point to the offending instruction. therefore, a floating-point exception handler uses the address in the fpiar to locate a floating-point instruction that has caused an exception. since the fmove to/from the fpcr, fpsr, or fpiar and fmovem instructions cannot generate floating-point exceptions, these instructions do not modify the fpiar. however, f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 9- 7 they can be used to read the fpiar in an exception handler without changing the previous value. a reset or a restore operation of the null state clears the fpiar. 9.3 floating-point data formats and data types the m68000 floating-point model (mc68881, mc68882, mc68040) supports the following data formats: single precision, double precision, extended precision, and packed decimal. the m68000 floating-point model supports the following data types: normalized, zeros, infinities, denormalized numbers, and nans. the mc68040 supports part of the m68000 floating-point model in hardware. table 9-2 lists the data formats and data types supported by the mc68040. tables 9-3 through 9-6 summarize the floating-point data formats and data types details. for further information on the data formats and data types, refer to the m68000um/ad, m68000 family programmer? reference manual . table 9-2. mc68040 fpu data formats and data types data formats number types single- precision real double- precision real extended- precision real packed- decimal real byte integer word integer long - word integer normalized * * * ? * * * zero * * * ? * * * infinity * * * ? nan * * * ? denormalized ? ? ? ? unnormalized ? ? *data format/type supported by on-chip mc68040 fpu hardware ?data format/type supported by software (mc68040fpsp) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
9- 8 m68040 user? manual motorola table 9-3. single-precision real format summary data format se f 31 30 23 22 0 field size in bits sign (s) 1 biased exponent (e) 8 fraction (f) 23 total 32 interpretation of sign positive fraction s = 0 negative fraction s = 1 normalized numbers bias of biased exponent +127 ($7f) range of biased exponent 0 < e < 255 ($ff) range of fraction zero or nonzero fraction 1.f relation to representation of real numbers (?) s 2 e?27 1.f denormalized numbers biased exponent format minimum 0 ($00) bias of biased exponent +126 ($7e) range of fraction nonzero fraction 0.f relation to representation of real numbers (?) s 2 ?26 0.f signed zeros biased exponent format minimum 0 ($00) fraction 0.f = 0.0 signed infinities biased exponent format maximum 255 ($ff) fraction 0.f = 0.0 nans sign don? care biased exponent format maximum 255 ($ff) fraction nonzero representation of fraction nonsignaling signaling nonzero bit pattern created by user fraction when created by fpcp 1xxxx?xxx 0xxxx?xxx xxxxx?xxx 11111?111 approximate ranges maximum positive normalized 3.4 10 38 minimum positive normalized 1.2 10 ?8 minimum positive denormalized 1.4 10 ?5 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 9- 9 table 9-4. double-precision real format summary data format s ef 63 62 52 51 0 field size (in bits) sign (s) 1 biased exponent (e) 11 fraction (f) 52 total 64 interpretation of sign positive fraction s = 0 negative fraction s = 1 normalized numbers bias of biased exponent +1023 ($3ff) range of biased exponent 0 < e < 2047 ($7ff) range of fraction zero or nonzero fraction 1.f relation to representation of real numbers (?) s 2 e?023 1.f denormalized numbers biased exponent format minimum 0 ($000) bias of biased exponent +1022 ($3fe) range of fraction nonzero fraction 0.f relation to representation of real numbers (?) s 2 ?022 0.f signed zeros biased exponent format minimum 0 ($00) fraction (mantissa/significand) 0.f = 0.0 signed infinities biased exponent format maximum 2047 ($7ff) fraction 0.f = 0.0 nans sign 0 or 1 biased exponent format maximum 255 ($7ff) fraction nonzero representation of fraction nonsignaling signaling nonzero bit pattern created by user fraction when created by fpcp 1xxxx?xxx 0xxxx?xxx xxxxx?xxx 11111?111 approximate ranges maximum positive normalized 1.8 x 10 308 minimum positive normalized 2.2 x 10 ?08 minimum positive denormalized 4.9 x 10 ?24 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
9- 10 m68040 user? manual motorola table 9-5. extended-precision real format summary data format s ef 95 94 80 79 64 u 62 0 j 63 field size (in bits) sign (s) 1 biased exponent (e) 15 zero, reserved (u) 16 explicit integer bit (j) 1 mantissa (f) 63 total 96 interpretation of unused bits input don? care output all zeros interpretation of sign positive mantissa s = 0 negative mantissa s = 1 normalized numbers bias of biased exponent +16383 ($3fff) range of biased exponent 0 < = e < 32767 ($7fff) explicit integer bit 1 range of mantissa zero or nonzero mantissa (explicit integer bit and fraction ) 1.f relation to representation of real numbers (?) s 2 e?6383 1.f denormalized numbers biased exponent format minimum 0 ($0000) bias of biased exponent +16383 ($3fff) explicit integer bit 0 range of mantissa nonzero mantissa (explicit integer bit and fraction ) 0.f relation to representation of real numbers (?) s 2 ?6383 0.f signed zeros biased exponent format minimum 0 ($0000) mantissa (explicit integer bit and fraction ) 0.0 signed infinities biased exponent format maximum 32767 ($7fff) explicit integer bit don? care mantissa (explicit integer bit and fraction ) x.000?000 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 9- 11 table 9-5. extended-precision real format summary (continued) nans sign don? care explicit integer bit don? care biased exponent format maximum 32767 ($7fff) mantissa nonzero representation of mantissa nonsignaling signaling nonzero bit pattern created by user mantissa when created by fpcp x.1xxxx?xxx x.0xxxx?xxx x.xxxxx?xxx 1.11111?111 approximate ranges maximum positive normalized 1.2 10 4932 minimum positive normalized 1.7 10 ?932 minimum positive denormalized 3.7 10 ?951 table 9-6. packed decimal real format summary data type sm se y y 3-digit exponent 1-digit integer 16-digit fraction infinity 0/1 1 1 1 $fff $xxxx $00?0 nan 0/1 1 1 1 $fff $xxxx nonzero snan 0/1 1 1 1 $fff $xxxx nonzero + zero 0 0/1 x x $000?999 $xxx0 $00?0 ?ero 1 0/1 x x $000?999 $xxx0 $00?0 + in-range 0 0/1 x x $000?999 $xxx0?xxx9 $00?1?99?9 ?n-range 1 0/1 x x $000?999 $xxx0?xxx9 $00?1?99?9 9.4 computational accuracy whenever an attempt is made to represent a real number in a binary format of finite precision, there is a possibility that the number can not be represented exactly. this is commonly referred to as a round-off error. furthermore, when two inexact numbers are used in a calculation, the error present in each number is reflected, and possibly aggravated, in the result. all fpu calculations use an intermediate result. when the mc68040 performs an operation, the calculation is carried out using extended-precision inputs, and the intermediate result is calculated as if to produce infinite precision. after the calculation is complete, the intermediate result is rounded to the selected precision and stored in the destination. the fpcr encodings provide emulation for devices that only support single and double precision. the execution speed of all instructions is the same whether using single- or double-precision rounding. when using these two forced rounding precisions, the f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
9- 12 m68040 user? manual motorola mc68040 produces the same results as any other device that conforms to the ieee 754 standard but does not support extended precision. the results are the same when performing the same operation in extended precision and storing the results in single- or double-precision format. the fpu performs all floating-point internal operations in extended precision. it supports mixed-mode arithmetic by converting single- and double-precision operands to extended- precision values before performing the specified operation. the fpu converts all memory data formats to extended-precision before using it in a floating-point operation or loading it in a floating-point data register. the fpu also converts extended-precision data formats in a floating-point data register to any data format and either stores it in a memory destination or in an integer data register. if the external operand is a denormalized number, the number is normalized before an operation is performed. however, an external denormalized number moved into a floating- point data register is stored as a denormalized number. if an external operand is an unnormalized number, the number is normalized before it is used in an arithmetic operation. if the external operand is an unnormalized zero (i.e., with a mantissa of all zeros), the number is converted to a normalized zero before the specified operation is performed. the regular use of unnormalized inputs not only defeats the purpose of the ieee 754 standard, but also can produce gross inaccuracies in the results. 9.4.1 intermediate result figure 9-7 illustrates the intermediate result format. the intermediate result? exponent for some dyadic operations (i.e., multiply and divide) can easily overflow or underflow the 15- bit exponent of the destination floating-point register. to simplify the overflow and underflow detection, intermediate results in the fpu maintain a 16-bit, twos-complement integer exponent. detection of an overflow or underflow intermediate result always converts the 16-bit exponent into a 15-bit biased exponent before being stored in a floating-point data register. the fpu internally maintains the 67-bit mantissa for rounding purposes. the mantissa is always rounded to 64 bits (or less, depending on the selected rounding precision) before it is stored in a floating-point data register. 16-bit exponent 63-bit fraction lsb of fraction guard bit round bit sticky bit integer bit overflow bit figure 9-7. intermediate result format if the destination is a floating-point data register, the result is in the extended-precision format and is rounded to the precision specified by the fpsr prec bits before being stored. all mantissa bits beyond the selected precision are zero. if the single- or double- f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 9- 13 precision mode is selected, the exponent value is in the correct range even if it is stored in extended-precision format. if the destination is a memory location, the fpsr prec bits are ignored. in this case, a number in the extended-precision format is taken from the source floating-point data register, rounded to the destination format precision, and then written to memory. depending on the selected rounding mode or destination data format in effect, the location of the least significant bit of the mantissa and the locations of the guard, round, and sticky bits in the 67-bit intermediate result mantissa varies. the guard and round bits are always calculated exactly. the sticky bit is used to create the illusion of an infinitely wide intermediate result. as the arrow illustrates in figure 9-7, the sticky bit is the logical or of all the bits in the infinitely precise result to the right of the round bit. during the calculation stage of an arithmetic operation, any non-zero bits generated that are to the right of the round bit set the sticky bit to one. because of the sticky bit, the rounded intermediate result for all required ieee arithmetic operations in the rn mode is in error by no more than one-half unit in the last place. 9.4.2 rounding the result range control is the process of rounding the mantissa of the intermediate result to the specified precision and checking the 16-bit intermediate exponent to ensure that it is within the representable range of the selected rounding-precision format. range control ensures correct emulation of a device that only supports single- or double-precision arithmetic. if the intermediate result? exponent exceeds the range of the selected precision, the exponent value appropriate for an underflow or overflow is stored as the result in the 16-bit extended-precision format exponent. for example, if the data format and rounding mode is single-precision rm and the result of an arithmetic operation overflows the magnitude of the single-precision format, the largest normalized single- precision value is stored as an extended-precision number in the destination floating-point data register (i.e., an unbiased 15-bit exponent of $00ff and a mantissa of $ffffff0000000000). if an infinity is the appropriate result for an underflow or overflow, the infinity value for the destination data format is stored as the result (i.e., an exponent with the maximum value and a mantissa of zero). figure 9-8 illustrates the algorithm that is used to round an intermediate result to the selected rounding precision and destination data format. if the destination is a floating- point data register, either the selected rounding precision specified by the fpcr prec bits or by the instruction itself determines the rounding boundary. for example, fsadd and fdadd specify single- and double-precision rounding regardless of the precision specified in the fpcr prec bits. if the destination is external memory or an integer data register, the destination data format determines the rounding boundary. if the rounded result of an operation is not exact, then the inex2 bit is set in the fpsr exc byte. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
9- 14 m68040 user? manual motorola entry inex2 1 guard 0
round 0
sticky 0 exit exit guard, round,
and sticky are
chopped shift mantissa
right 1 bit,
add 1 to exponent add 1 to
lsb select rounding mode guard and lsb = 1,
round and sticky = 0
or
guard = 1
round or sticky = 1 intermediate
result overflow = 1 guard, round,
and sticky bits = 0 exact result rp rm rn rz add 1 to
lsb intermediate
result pos neg pos neg figure 9-8. round ing algorithm flowchart the three additional bits beyond the extended-precision format, the difference between the intermediate result? 67-bit mantissa and the stored result? 64-bit mantissa, allow the fpu to perform all calculations as though it were performing calculations using a float engine with infinite bit precision. the result is always correct for the specified destination? data format before performing rounding (unless an overflow or underflow error occurs). the specified rounding operation then produces a number that is as close as possible to the infinitely precise intermediate value and still representable in the selected precision. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 9- 15 the following tie-case example illustrates how the 67-bit mantissa allows the fpu to meet the error bound of the ieee specification: result integer 63-bit fraction guard round sticky intermediate x xxx?00 1 0 0 rounded-to-nearest x xxx?00 0 0 0 the least significant bit of the rounded result does not increment even though the guard bit is set in the intermediate result. the ieee 754 standard specifies that tie cases should be handled in this manner. if the destination data format is extended and there is a difference between the infinitely precise intermediate result and the round-to-nearest result, the relative difference is 2 ?4 (the value of the guard bit). this error is equal to one- half of the least significant bit? value and is the worst case error that can be introduced when using the rn mode. thus, the term one-half unit in the last place correctly identifies the error bound for this operation. this error specification is the relative error present in the result; the absolute error bound is equal to 2 exponent x 2 ?4 . the following example illustrates the error bound for the other rounding modes: result integer 63-bit fraction guard round sticky intermediate x xxx?00 1 1 1 rounded-to-nearest x xxx?00 0 0 0 the difference between the infinitely precise result and the rounded result is 2 ?4 + 2 ?5 + 2 ?6 , which is slightly less than 2 ?3 (the value of the least significant bit). thus, the error bound for this operation is not more than one unit in the last place. for all arithmetic operations, the fpu meets these error bounds, providing accurate and repeatable results. 9.5 postprocessing operation most operations end with a postprocessing step. the fpu provides two steps in postprocessing. first, the condition code bits in the fpsr are set or cleared at the end of each arithmetic operation or move operation to a single floating-point data register. the condition code bits are consistently set based on the result of the operation. second, the fpu supports 32 conditional tests that allow floating-point conditional instructions to test floating-point conditions in exactly the same way as the integer conditional instructions test the integer condition codes. the combination of consistently set condition code bits and the simple programming of conditional instructions gives the mc68040 a very flexible, high-performance method of altering program flow based on floating-point results. while reading the summary for each instruction, it should be assumed that an instruction performs postprocessing unless the summary specifically states that the instruction does not do so. the following paragraphs describe postprocessing in detail. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
9- 16 m68040 user? manual motorola 9.5.1 underflow, round, overflow during the calculation of an arithmetic result, the fpu arithmetic logic unit (alu) has more precision and range than the 80-bit extended-precision format. however, the final result of these operations is an extended-precision floating-point value. in some cases, an intermediate result becomes either smaller or larger than can be represented in extended precision. also, the operation can generate a larger exponent or more bits of precision than can be represented in the chosen rounding precision. for these reasons, every arithmetic instruction ends by rounding the result and checking for overflow and underflow. at the completion of an arithmetic operation, the intermediate result is checked to see if it is too small to be represented as a normalized number in the selected precision. if so, the unfl-bit is set in the fpsr exc byte. the mc68040 then takes a nonmaskable underflow exception and executes the m68040fpsp underflow exception handler, denormalizing the result. denormalizing a number causes a loss of accuracy, but a zero is not returned unless absolutely necessary. if a number has grossly underflowed, the m68040fpsp returns a zero or the smallest denormalized number with the correct sign, depending on the rounding mode in effect. if no underflow occurs, the intermediate result is rounded according to the user-selected rounding precision and rounding mode. after rounding, the inex2-bit of the fpsr exc byte is set accordingly. finally, the magnitude of the result is checked to see if it is too large to be represented in the current rounding precision. if so, the ovfl-bit of the fpsr exc byte is set. the m68040fpsp returns a correctly signed infinity or a correctly signed largest normalized number, depending on the rounding mode in effect. 9.5.2 conditional testing unlike the integer arithmetic condition codes, an instruction either always sets the floating- point condition codes in the same way or it does not change them at all. therefore, the instruction descriptions do not include floating-point condition code settings. the following paragraphs describe how floating-point condition codes are set for all instructions that modify condition codes. refer to 9.2.3.1 floating-point condition code byte for a description of the fpcc byte. the condition code bits differ slightly from the integer condition codes. unlike the operation-type-dependent integer condition codes, examining the result at the end of the operation sets or clears the floating-point condition codes accordingly. the m68000 family integer condition codes bits n and z have this characteristic, but the v and c bits are set differently for different instructions. the data type of the operation? result determines how the four condition code bits are set. table 9-7 lists the condition code bit setting for each data type. the mc68040 generates only eight of the 16 possible combinations. loading the fpcc with one of the other combinations and executing a conditional instruction can produce an unexpected branch condition. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 9- 17 table 9-7. floating-point condition code encodings data type n z i nan + normalized or denormalized 0 0 0 0 ?normalized or denormalized 1 0 0 0 + 0 010 0 0 110 0 + infinity 0 0 1 0 ?infinity 1 0 1 0 + nan 0 0 0 1 ?nan 1 0 0 1 the inclusion of the nan data type in the ieee floating-point number system requires each conditional test to include the nan condition code bit in its boolean equation. because a comparison of a nan with any other data type is unordered (i.e., it is impossible to determine if a nan is bigger or smaller than an in-range number), the compare instruction sets the nan condition code bit when an unordered compare is attempted. all arithmetic instructions also set the fpcc nan bit if the result of an operation is a nan. the conditional instructions interpret the nan condition code bit equal to one as the unordered condition. the ieee 754 standard defines four conditions: equal to (eq), greater than (gt), less than (lt), and unordered (un). in addition, the standard only requires the generation of the condition codes as a result of a floating-point compare operation. the fpu tests for these conditions and 28 others at the end of any operation affecting the condition codes. for purposes of the floating-point conditional branch, set byte on condition, decrement and branch on condition, and trap on condition instructions, the mc68040 logically combines the four fpcc bits to form 32 conditional tests. the 32 conditional tests are separated into two groups?6 that cause an exception if an unordered condition is present when the conditional test is attempted, ieee nonaware tests, and 16 that do not cause an exception, ieee aware tests. the set of ieee nonaware tests is best used: when porting a program from a system that does not support the ieee 754 standard to a conforming system or when generating high-level language code that does not support ieee floating-point concepts (i.e., the unordered condition). an unordered condition occurs when one or both of the operands in a floating-point compare operation is a nan. the inclusion of the unordered condition in floating-point branches destroys the familiar trichotomy relationship (greater than, equal, less than) that exists for integers. for example, the opposite of floating-point branch greater than (fbgt) is not floating-point branch less than or equal (fble). rather, the opposite condition is floating-point branch not greater than (fbngt). if the result of the previous instruction was unordered, fbngt is true; whereas, both fbgt and fble would be false since unordered fails both of these tests (and sets bsun). compiler programmers should be f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
9- 18 m68040 user? manual motorola particularly careful of the lack of trichotomy in the floating-point branches since it is common for compilers to invert the sense of conditions. when using the ieee nonaware tests, the user receives a bsun exception whenever a branch is attempted and the nan condition code bit is set, unless the branch is an fbeq or an fbne. if the bsun exception is enabled in the fpcr, the exception causes another exception. therefore, the ieee nonaware program is interrupted if an unexpected condition occurs. compilers and programmers who are knowledgeable of the ieee 754 standard should use the ieee aware tests in programs that contain ordered and unordered conditions. since the ordered or unordered attribute is explicitly included in the conditional test, the bsun bit is not set in the fpsr exc byte when the unordered condition occurs. table 9-8 summarizes the conditional mnemonics, definitions, equations, predicates, and whether the bsun bit is set in the fpsr exc byte for the 32 floating-point conditional tests. the equation column lists the combination of fpcc bits for each test in the form of an equation. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 9- 19 table 9-8. floating-point conditional tests mnemonic definition equation predicate bsun bit set ieee nonaware tests eq equal z 000001 no ne not equal z 001110 no gt greater than nan v ? v ? 010010 yes ngt not greater than nan v z v n 011101 yes ge greater than or equal z v ( nan v ? ) 010011 yes nge not greater than or equal nan v (n l z) 011100 yes lt less than n l ( nan v ? ) 010100 yes nlt not less than nan v (z v n ) 011011 yes le less than or equal z v (n l nan ) 010101 yes nle not less than or equal nan v (n v ? ) 011010 yes gl greater or less than nan v ? 010110 yes ngl not greater or less than nan v z 011001 yes gle greater, less, or equal nan 010111 yes ngle not greater, less, or equal nan 011000 yes ieee aware tests eq equal z 000001 no ne not equal z 001110 no ogt ordered greater than nan v z v n 000010 no ule unordered or less or equal nan v z v n 001101 no oge ordered greater than or equal z v (nan v n ) 000011 no ult unordered or less than nan v (n l z ) 001100 no olt ordered less than n l ( nan v z ) 000100 no uge unordered or greater or equal nan v z v n 001011 no ole ordered less than or equal z v (n l nan ) 000101 no ugt unordered or greater than nan v (n v z ) 001010 no ogl ordered greater or less than nan v z 000110 no ueq unordered or equal nan v z 001001 no or ordered nan 000111 no un unordered nan 001000 no miscellaneous tests f false false 000000 no t true true 001111 no sf signaling false false 010000 yes st signaling true true 011111 yes seq signaling equal z 010001 yes sne signaling not equal z 011110 yes note: all condition codes with an overbar indicate cleared bits; all other bits are set. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
9- 20 m68040 user? manual motorola 9.6 floating-point exceptions there are two classes of floating-point-related exceptions: nonarithmetic floating-point exceptions and arithmetic floating-point exceptions. the latter relates to the handling of arithmetic exceptions caused by floating-point activity, and the former includes unimplemented floating-point instructions and unsupported data types not related to the handling of arithmetic exceptions. format error and ftrapcc exceptions may seem to be floating-point related, but are considered iu exceptions (see section 8 exception processing ). the following sections detail floating-point exceptions and how the mc68040 and m68040fpsp handle them. table 9-9 lists the vector numbers related to floating-point exceptions. table 9-9. floating-point exception vectors vector number vector offset (hex) assignment 11 48 49 50 51 02c 0c0 0c4 0c8 0cc floating-point unimplemented instruction (also used for f-line instruction) floating-point branch or set on unordered condition floating-point inexact result floating-point divide by zero floating-point underflow 52 53 54 55 0d0 0d4 0d8 0dc floating-point operand error floating-point overflow floating-point snan floating-point unimplemented data type the following paragraphs detail nonarithmetic floating-point exceptions. 9.6.1 unimplemented floating-point instructions f-line instructions are instruction word patterns with bits 15?2 that have an $f encoding, causing f-line exceptions. these instructions are termed unimplemented floating-point instructions and cause an unimplemented floating-point exception. the mc68040 recognizes some f-line instructions, such as the fmul and cpush, which do not cause f-line exceptions. there are some f-line instructions that the mc68040 recognizes as valid mc68881/mc68882 floating-point instruction patterns, but as floating-point instructions that the processor cannot complete in hardware. table 9-10 lists the floating- point instructions that are unimplemented and therefore cause an unimplemented instruction exception. if the processor encounters an f-line instruction and the instruction patterns do not match either of the above two cases, the processor takes an f-line illegal exception. f-line illegal exceptions are discussed further in section 8 exception processing . the processor generates an exception with vector number 11 and pushes a four-word stack frame format $0 on the system stack. an illegal instruction exception is also reported when a breakpoint acknowledge bus cycle is run and terminated with either a transfer acknowledge ( ta ) or transfer error acknowledge ( tea ) signal. since the unimplemented floating-point f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 9- 21 exception and the f-line illegal instruction share the same vector, the exception handler uses the stack frame format ($0 or $2) to distinguish between the two. table 9-10. unimplemented instructions monadic operations facos fintrz fasin flog10 fatan flogn fatanh flognp1 fcos fmovecr fcosh fsin fetox fsincos fetoxm1 fsinh fgetexp ftan fgetman ftanh fint ftentox ftwotox dyadic operations fmod frem fscale when an unimplemented floating-point instruction is encountered, the processor waits for all previous floating-point instructions to complete execution. pending exceptions are taken and handled prior to the execution of the unimplemented instruction. next, the instruction is partially decoded to allow fetching of the memory source operand, if required. when the operand fetch begins, all other read accesses for previous instructions are complete, and only the execution and write-back of results for previous integer instructions remains to be completed. if an access error (bus error) occurs in fetching the operand or in completing any other access before beginning the operand fetch, the unimplemented instruction is restarted after the processor returns from exception handling for the error. refer to section 8 exception processing for more information on access errors. the fetched source operand is passed to the fpu, which converts the operand to extended precision and saves the intermediate result. if the operand is an unsupported data type (denormalized, unnormalized, or packed decimal real), the unimplemented floating-point exception takes precedence, and the floating-point instruction emulation routine must detect the unsupported data type. the processor begins exception processing for the unimplemented floating-point instruction by making an internal copy of the current sr. the processor then enters the supervisor mode and clears the trace bits (t1, t0). the processor creates a format $2 stack frame and saves the vector offset, pc, internal copy of the sr, and calculated f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
9- 22 m68040 user? manual motorola effective address in the stack frame. the saved pc value is the logical address of the instruction that follows the unimplemented floating-point instruction. the processor generates exception vector number 11 for the unimplemented f-line instruction exception vector, fetches the address of the f-line exception handler from the processor? exception vector table, pushes the format $2 stack frame on the system stack, and begins execution of the exception handler after prefetching instructions to fill the pipeline. the exception handler emulates the unimplemented floating-point instruction in software, maintaining user-object-code compatibility. refer to section 8 exception processing for details about exception vectors and format $2 stack frames. the f-line exception handler checks for the format $2 stack frame to distinguish an unimplemented floating-point instruction from other f-line unimplemented instructions. when the exception handler for unimplemented floating-point instructions executes an fsave, a 26-word unimplemented instruction state frame is created (see figure 9-10). at this point, an fsave instruction yields the information as listed in table 9-16. note that unless the instruction specifies a packed decimal real source, the state frame contains both operands (if required). for packed decimal real data format, the second operand is in the designated format of the destination floating-point data register. the exception handler uses the information provided in the state frame to determine the instruction that it needs to emulate and the input operands to that instruction. once the instruction has been emulated and the result is reached, the exception handler moves the result into the appropriate destination floating-point data register, discards the unimplemented instruction state frame, and returns to normal instruction flow using the rte instruction. the limitation to this approach is that no floating-point arithmetic exceptions can be reported at the end of the emulated instruction. the m68040fpsp not only emulates the instruction, but in addition, it ensures that if any floating-point arithmetic exceptional conditions arise from the emulation of the unimplemented instruction and if the corresponding floating-point arithmetic exception is enabled, the m68040fpsp manipulates the stack and restores the stack back into the fpu in the desired exceptional state. this effectively imitates the action of the mc68040 implemented instructions since the exception is not reported until the next floating-point instruction is encountered. this manipulation of the stack is rather complicated and is beyond the scope of this manual. motorola recommends that the user utilize the m68040fpsp if a full exception-reporting model is required. motorola does not provide any printed documentation other than what is embedded in the source code of the m68040fpsp. 9.6.2 unsupported floating-point data types an unsupported data type exception occurs when either operand to an implemented floating-point instruction is denormalized (for single-, double-, and extended-precision operands), unnormalized (for extended-precision operands), or either the source or destination data format is packed decimal real. these data types are unimplemented in the mc68040 and must be emulated in software. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 9- 23 note in this manual, all references to the unsupported floating-point data types also refer to the unimplemented data types in the m68040fpsp. when the processor encounters an unsupported data type, the procedure taken is identical to that used when an unimplemented instruction is taken. unsupported data types with operands that have opclass 010 or 000 (register-to-register or memory-to- register) instructions cause a pre-instruction exception. when an unsupported data type is detected for opclass 011 (register-to-memory) instructions, a post-instruction exception is generated immediately. a format $0 (for the pre-instruction exception) or format $3 (for the post-instruction exception) stack frame is saved, and vector number 55 is fetched. a denormalized value generated as the result of a floating-point operation generates a nonmaskable underflow exception instead of an unsupported data type exception. table 9-16 lists the floating-point state frame fields for unsupported data type exceptions resulting from the execution of opclass 010 or 000 (register-to-register or memory-to- register) instructions, and opclass 011 (register-to-memory) instructions defined for the use by the supervisor exception handler. a denormalized or unnormalized extended-precision source or destination operand is copied directly without modification to etemp or fptemp fields in the floating-point state frame. if a packed decimal real source operand is specified, the upper 32 bits of the operand are copied to the fptemp field, and the lower 64 bits are copied to etemp. the destination operand in this case remains in the destination floating-point register, and can be either denormalized or unnormalized. figure 9-9 illustrates denormalized single- (a) and double-precision (b) operands stored in etemp field. the exception handler uses the floating-point state frame information to determine which operand (or operands) is the unsupported data type and which instruction attempted to use the offending operand. the exception handler must provide the routines needed to complete the instruction and to store that instruction to the proper destination, whether it be in a floating-point data register, integer data register, or external memory. once the destination is written, the floating-point state frame is discarded, and normal execution is resumed by using the rte instruction. this approach does not report floating-point arithmetic exceptions that may have been generated. motorola recommends that the user utilize the m68040fpsp if a full exception-reporting model is required. motorola does not provide any printed documentation other than what is embedded in the source code of the m68040fpsp. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
9- 24 m68040 user? manual motorola denormalized single precision 31 30 23 22 0 0 39 40 62 63 64 79 80 94 95 $0 s $0 $0 $0 0 format in state frame mantissa exp s fraction denormalized double precision 63 62 52 51 0 0 10 11 62 63 64 79 80 94 95 $0 s $0 $0 $0 0 format in state frame mantissa exp s mantissa (a) single precision (b) double precision figure 9-9. format of denormalized operand in state frame 9.7 floating-point arithmetic exceptions the following eight user floating-point arithmetic exceptions are listed in order of priority. the mc68040 generates the first seven exceptions in hardware and the eighth only in software. branch/set on unordered (bsun) signaling not-a-number (snan) operand error (operr) overflow (ovfl) underflow (unfl) divide by zero (dz) inexact 2 (inex2) inexact 1 (inex1) inex1 exception is the condition that exists when a packed decimal operand cannot be converted exactly to the extended-precision format in the current rounding mode. since f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 9- 25 the mc68040 does not directly support packed decimal real operands, the processor never sets inex1 bit in the fpsr exc byte, but provides it as a latch so that emulation software can report the exception. a floating-point arithmetic exception is taken in one of two situations. the first situation occurs when the user program enables an arithmetic exception by setting a bit in the fpcr enable byte and the corresponding bit in the fpsr exc byte matches the bit in the fpcr enable byte as a result of program execution; this is referred to as maskable exception conditions. a user write operation to the fpsr, which sets a bit in the exc byte, does not cause an exception to be taken, regardless of the value in the enable byte. when a user writes to the enable byte that enables a class of floating-point exceptions, a previously generated floating-point exception does not cause an exception to be taken, regardless of the value in the fpsr exc byte. the user can clear a bit in the fpcr enable byte, disabling each corresponding exception. the second situation occurs when the processor encounters a nonmaskable snan, operr, ovfl, and unfl condition; this is referred to as nonmaskable exception conditions. this allows a supervisor exception handler to correct a defaulting result generated by the mc68040 that is different from the result generated by an mc68881/mc68882 executing the same code. after correcting the result, the supervisor exception handler calls a user-defined exception handler if the exception has been enabled in the fpcr enable byte or returns to the main program flow if the exception is disabled. a single instruction execution can generate dual and triple exceptions. when multiple exceptions occur with exceptions enabled for more than one exception class, the highest priority exception is reported; the lower priority exceptions are never reported or taken. the previous list of arithmetic floating-point exceptions is in order of priority. the bits of the enable byte are organized in decreasing priority, with bit 15 being the highest and bit 8 the lowest. the exception handler must check for multiple exceptions. the address of the exception handler is derived from the vector number corresponding to the exception. the following is a list of multiple instruction exceptions that can occur: snan and inex1 operr and inex2 operr and inex1 ovfl and inex2 and/or inex1 unfl and inex2 and/or inex1 9.7.1 branch/set on unordered (bsun) the bsun exception is the result of performing an ieee nonaware conditional test associated with the fbcc, fdbcc, ftrapcc, and fscc instructions when an unordered condition is present. refer to 9.5.2 conditional testing for information on conditional tests. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
9- 26 m68040 user? manual motorola if a floating-point exception is pending from a previous floating-point instruction, a pre- instruction exception is taken. after the appropriate exception handler is executed, the conditional instruction is restarted. when the fpu pipeline is idle (all previous floating- point instructions have completed) and no exceptions are pending, the processor evaluates the conditional predicate and checks for a bsun exception before executing the conditional instruction. 9.7.1.1 maskable exception conditions. a bsun exception occurs if the conditional predicate is one of the ieee nonaware branches and the fpcc nan bit is set. when the processor detects this condition, it sets the bsun bit in the fpsr exc byte. a. if the user bsun exception handler is disabled, the floating-point condition is evaluated as if it were the equivalent ieee aware conditional predicate. no exceptions are taken. b. if the user bsun exception handler is enabled, the processor takes a floating-point pre-instruction exception. a $0 stack frame is saved, and vector number 48 is generated to access the bsun exception vector. the bsun entry in the processor? vector table points to the m68040fpsp bsun exception handler. for mc68881/mc68882 compatibility, the m68040fpsp updates the fpiar by copying the pc value in the pre-instruction stack frame to the fpiar. the m68040fpsp bsun exception handler restores the fpu to its exceptional state, cleans up the stack to the state prior to the m68040fpsp bsun exception handler? execution, and continues instruction execution at the user bsun exception handler. no parameters are passed to the user bsun exception handler since the m68040fpsp bsun exception handler provides the illusion that it never existed. the user bsun exception handler must execute an fsave as its first floating-point instruction. fsave allows other floating-point instructions to execute without reporting the bsun exception again, although none of the state frame values are useful in the execution of the user bsun exception handler. the bsun exception is unique in that the exception is taken before the conditional predicate is evaluated. if the user bsun exception handler does not set the pc to the instruction following the one that caused bsun exception when returning, the exception is executed again. therefore, it is the responsibility of the user bsun exception handler to prevent the conditional instruction from taking the bsun exception again. there are four ways to prevent taking the exception again: 1. incrementing the stored pc in the stack bypasses the conditional instruction. this technique applies to situations where a fall-through is desired. note that accurate calculation of the pc increment requires detailed knowledge of the size of the conditional instruction being bypassed. 2. clearing the nan bit prevents the exception from being taken again. however, this alone cannot deterministically control the result? indication (true or false) that would be returned when the conditional instruction reexecutes. 3. disabling the bsun bit also prevents the exception from being taken again. like the second method, this method cannot control the result indication (true or false) that would be returned when the conditional instruction reexecutes. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 9- 27 4. examining the conditional predicate and setting the fpcc nan bit accordingly prevents the exception from being taken again. this technique gives the most control since it is possible to pre-determine the direction of program flow. bit 7 of the f-line operation word indicates where the conditional predicate is located. if bit 7 is set, the conditional predicate is the lower six bits of the f-line operation word. otherwise, the conditional predicate is the lower six bits of the instruction word, which immediately follows the f-line operation word. using the conditional predicate and the table for ieee nonaware test in 9.5.2 conditional testing , the condition codes can be set to return a known result indication when the conditional instruction is reexecuted. prior to exiting the user bsun exception handler, the exception handler discards the floating-point state frame. 9.7.1.2 nonmaskable exception conditions. there are no conditions. 9.7.2 signaling not-a-number (snan) an snan is used as an escape mechanism for a user-defined, non-ieee data type. the processor never creates an snan as a result of an operation; a nan created by an operand error exception is always a nonsignaling nan. when an operand is an snan involved in an arithmetic instruction, the snan bit is set in the fpsr exc byte. since the fmovem, fmove fpcr, and fsave instructions do not modify the status bits, they cannot generate exceptions. therefore, these instructions are useful for manipulating snans. 9.7.2.1 maskable exception conditions. when an snan is encountered, if the destination is a floating-point data register or is in memory (or an integer data register) and the format is single, double, or extended precision, the snan is maskable and may or may not take an exception. a. if the user snan exception is disabled, the processor clears the snan bit in the nan data format and the resulting nonsignaling nan is transferred to the destination. no bits other than the snan bit of the nan data format are modified, although the input nan is truncated if necessary. instruction execution continues without taking any exceptions. b. if the user snan exception handler is enabled, the processor posts an exception and another floating-point instruction is eventually encountered; a pre-instruction exception is reported at that time. the snan entry in the processor? vector table points to the m68040fpsp snan exception handler. once the m68040fpsp snan exception handler recognizes the operand error as a maskable condition, it does not modify the destination or pass control to the user snan exception handler. 9.7.2.2 nonmaskable exception conditions. when an snan is encountered, if the destination is either in memory or an integer data register and the format is byte, word, or long word, a nonmaskable post-instruction exception occurs and is taken immediately. the snan entry in the processor? vector table points to the m68040fpsp snan exception handler. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
9- 28 m68040 user? manual motorola the m68040fpsp snan exception handler checks to see if the instruction is an fmove to byte, word, or long word. if one of these conditions is met, the m68040fpsp snan exception handler stores the most significant 8, 16, or 32 bits, respectively, of the snan mantissa, with the snan bit set, to the destination. next, it determines whether or not the user snan exception is enabled. a. if the user snan exception is disabled, the m68040fpsp snan exception handler checks for an inex1 or inex2 exception condition and determines whether or not it needs to go to the user inex exception handler. if not, the m68040fpsp returns to normal instruction execution. otherwise, the m68040fpsp snan exception handler restores the fpu to its exceptional state, cleans up the stack to the conditions prior to execution, and continues instruction execution at the user inex exception handler. no parameters are passed to the user inex exception handler since the m68040fpsp snan exception handler provides the illusion that it never existed. b. if the user snan exception handler is enabled, the m68040fpsp snan exception handler checks to see if the destination is a floating-point data register or in memory (or an integer data register) with single-, double-, or extended-precision format. if so, the m68040fpsp snan exception handler determines which input operand is the snan, sets the snan bit in the nan data format, and transfers the resulting nonsignaling nan to the destination. once the destination has been written, the m68040fpsp snan exception handler restores the fpu to its exceptional state, cleans up the stack to the conditions prior to its execution, and continues instruction execution at the user snan exception handler. no parameters are passed to the user snan exception handler since the m68040fpsp snan exception handler provides the illusion that it never existed. the user snan exception handler must execute an fsave as the first floating-point instruction. table 9-16 lists the floating-point state frame fields for snan pre-instruction exceptions resulting from the execution of opclass 010 or 000 (register-to-register or memory-to-register) instructions, and for snan post-instruction exceptions resulting from the execution of opclass 011 (register-to-memory) instructions defined for the use by the supervisor exception handler. a source or destination snan is stored in etemp or fptemp, respectively, with its snan bit set. the user snan exception handler can overwrite the result to the specified destination. the exception handler must be aware that it is possible for an inex1 exceptional condition to co-exist with an snan exception. since the snan exception has higher priority, the inex1 exception is hidden, and it becomes the responsibility of the snan exception handler to detect and correct this if desired. to return to normal execution, the state frame is discarded prior to execution of the rte of the user-defined exception handler. 9.7.3 operand error the operand error exception encompasses problems arising in a variety of operations, including those errors not frequent or important enough to merit a specific exceptional condition. basically, an operand error occurs when an operation has no mathematical interpretation for the given operands. table 9-11 lists the possible operand errors, both native and not native to the mc68040, which the m68040fpsp unimplemented instruction f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 9- 29 exception handler can report. when an operand error occurs, the operr bit is set in the fpsr exc byte. table 9-11. possible operand errors exceptions instruction condition causing operand error native to mc68040 fadd (+inf) + (?nf) or (?nf) + (+inf) fdiv 0 ? 0 or inf ? inf fmove to b,w,or l integer overflow where the source is nonsignaling nan or +infinity. fmul one operand is 0 and other is +inf. fsqrt source < 0 or inf. fsub (+inf) ?(+inf) or (?nf) ?(?nf) nonnative to mc68040 facos source is inf, > +1, or < ? fasin source is inf, > +1, or < ? fatanh source is > +1 or < ? fcos source is inf fgetexp source is inf fgetman source is inf flog10 source is < 0 flog2 source is < 0 flogn source is < 0 flognp1 source is 1 fmod floating-point data register is inf or source is 0, other operand is not a nan fmove to p source exponent > 999 (decimal) or k-factor > 17 frem floating-point data register is inf or source is 0, other operand is not a nan fscale source is inf fsgldiv 0 ? 0 or inf ? inf fsglmul one operand is 0, other operand is inf fsin source is inf fsincos source is inf ftan source is inf 9.7.3.1 maskable exception conditions. all conditions apply as listed in table 9-11, with the exception of the fmove to byte, word, or long-word case. a. if the user operr exception handler is disabled, an extended-precision nonsignaling nan with all mantissa bits set is stored in the destination floating-point data register. no exceptions are reported, and instruction execution proceeds normally. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
9- 30 m68040 user? manual motorola b. if the user operr exception handler is enabled and the destination floating-point data register is not modified, an operr exception is posted. the next floating-point instruction that is encountered takes a pre-instruction exception. the operr entry in the processor? vector table points to the m68040fpsp operr exception handler. once the m68040fpsp operr exception handler recognizes the operand error as a maskable condition, it does not modify the destination or pass control to the user operr exception handler. 9.7.3.2 nonmaskable exception conditions. if an fmove to byte, word, or long word has a source operand that is too large to be represented in the specified destination integer format (integer overflow, nan, infinity) or if the source operand is equal to the largest negative integer representable in the specified destination integer format (erroneous mc68040 condition), the processor immediately takes a post-instruction exception. instruction execution continues at the m68040fpsp operr exception handler. if the m68040fpsp determines a nonmaskable erroneous mc68040 condition caused the exception, it stores the largest negative integer representable in the given destination integer format (? 7 for byte, ? 15 for word, and ? 31 for long word). the m68040fpsp operr exception handler then returns the processor to normal processing. if an integer overflow or an fmove to byte, word, or long word with a source of infinity causes the exception, then the destination is written with the largest positive or negative integer that can be represented in the given format. if an fmove to byte of word or long word with a source of nan causes the exception, then the most significant 8, 16, or 32 bits, respectively, are written to the destination. next, the m68040fpsp operr exception handler checks to see if the user operr exception handler is enabled. a. if the user operr exception handler is disabled, an exception-causing inex1 or inex2 condition exists, and the user inex exception handler is enabled. the m68040fpsp operr exception handler restores the fpu to its exceptional state, cleans up the stack to the conditions prior to execution, and continues instruction execution at the user inex exception handler. no parameters are passed to the user inex exception handler since the m68040fpsp operr exception handler provides the illusion that it never existed. otherwise, the m68040fpsp operr exception handler returns the processor to normal processing. b. if the user operr exception handler is enabled and the destination is a floating- point data register, then the m68040fpsp exception handler does not modify the register. the m68040fpsp operr exception handler restores the fpu to its exceptional state, cleans up the stack to the conditions prior to execution, and continues instruction execution at the user operr exception handler. no parameters are passed to the user operr exception handler since the m68040fpsp operr exception handler provides the illusion that it never existed. the user operr exception handler must execute an fsave as its first floating-point instruction. table 9-16 lists the floating-point state frame fields for operr exceptions resulting from the execution of opclass 010 or 000 (register-to-register or memory-to- register) instructions and opclass 011 (register-to-memory) instructions defined for the use by the supervisor exception handler. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 9- 31 the cmdreg1b field of the floating-point state frame can be used to determine the instruction that caused of the operr exception. note that cmdreg1b could be any of the instructions listed in table 9-11. if the destination is a floating-point data register, this exception handler needs to supply the contents. if the destination is memory, the effective address is supplied in the format $3 stack frame. if the destination is an integer data register, the fpiar points to the f-line instruction word that contains the integer data register number. to exit the user operr exception handler, the saved floating-point frame need not be restored and can be discarded prior to execution of the rte instruction. 9.7.4 overflow an overflow exception is detected for arithmetic operations in which the destination is a floating-point data register or memory when the intermediate result? exponent is greater than or equal to the maximum exponent value of the selected rounding precision. overflow can only occur when the destination is in the single-, double-, or extended- precision format; all other data format overflows are handled as operand errors. at the end of any operation that could potentially overflow, the intermediate result is checked for underflow, rounded, and then checked for overflow before it is stored to the destination. if overflow occurs, the ovfl bit is set in the fpsr exc byte. even if the intermediate result is small enough to be represented as an extended- precision number, an overflow can occur. the intermediate result is rounded to the selected precision, and the rounded result is stored in the extended-precision format. if the magnitude of the intermediate result exceeds the range of the selected rounding precision format, an overflow occurs. 9.7.4.1 maskable exception conditions. there are no conditions. 9.7.4.2 nonmaskable exception conditions. when the ovfl bit is set in the fpsr exc byte as a result of a floating-point instruction, the processor always takes a nonmaskable overflow exception. if the destination is a floating-point data register, then the register is not affected, and either a pre-instruction or a post-instruction exception is reported. if the destination is a memory or integer data register, an undefined result is stored, and a post-instruction exception is taken immediately. execution begins at the m68040fpsp ovfl exception handler. the values defined in table 9-12 are stored in the destination based on the rounding mode defined in the fpcr mode byte. the m68040fpsp ovfl exception handler rounds the result according to the rounding precision defined in the fpcr mode byte if the destination is a floating-point data register. if the destination is in memory or an integer data register, then the rounding precision in the fpcr mode byte is ignored, and the given destination format defines the rounding precision. if the instruction has a forced rounding precision (e.g., fsadd, fdmul), the instruction defines the rounding precision. the m68040fpsp ovfl exception handler then checks to see if the user ovfl exception handler is enabled. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
9- 32 m68040 user? manual motorola table 9-12. overflow rounding mode values rounding mode result rn infinity, with the sign of the intermediate result. rz largest magnitude number, with the sign of the intermediate result. rm for positive overflow, largest positive number; for negative overflow, infinity. rp for positive overflow, infinity; for negative overflow, largest negative number. a. if the user ovfl exception handler is disabled, the m68040fpsp ovfl exception handler checks for an inex1 or inex2 exception condition with the user inex exception handler enabled. if not, the processor returns to normal instruction flow. otherwise, the m68040fpsp ovfl exception handler restores the fpu to its exceptional state, cleans up the stack to the conditions prior its execution, and continues instruction execution at the user inex exception handler. no parameters are passed to the user inex exception handler since the m68040fpsp ovfl exception handler provides the illusion that it never existed. otherwise, the m68040fpsp ovfl exception handler returns the processor to normal processing. b. if the user ovfl exception handler is enabled, the m68040fpsp ovfl restores the fpu to its exceptional state, cleans up the stack to the conditions prior to execution, and continues instruction execution at the user ovfl exception handler. no parameters are passed to the user ovfl exception handler since the m68040fpsp ovfl exception handler provides the illusion that it never existed. the user ovfl exception handler must execute an fsave as its first floating-point instruction. the destination contains the rounding mode values listed in table 9-12, and the user ovfl exception handler can choose to modify these values. the e3 and e1 bits of the floating-point state frame are examined to determine which fields on the floating- point state frame are valid. e3 always takes precedence and must be serviced first. table 9-16 lists the floating-point state frame fields for ovfl exceptions with e3 set or with e3 clear and e1 set. note that it is possible for an fadd, fsub, fmul, and fdiv to report a post-instruction exception, although these instructions normally generate a pre-instruction exception. the following example illustrates the reason why a post-instruction exception is generated. fadd fp2,fp0 ; this instruction generates an overflow exception fmove fp0, ; this instruction is executing when overflow occurs in this example, assume that the fmove instruction starts once the fadd instruction generates an overflow. given the register dependency on fp0, the destination of the fadd instruction, fp0 needs to be resolved prior to fmove instruction execution. for this example, there is no choice but to have the fadd instruction report a post-instruction exception immediately. note that for this case, even though the t-bit of the floating-point state frame is set, (post-instruction exception), it does not imply an fmove out instruction. therefore, the effective address field in the format $3 stack frame is invalid. the fmove out instruction generates a post-instruction exception. for this case, the effective address field in the format $3 stack frame points to the destination memory location. if the destination is an integer data register, the fpiar points to the f-line word f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 9- 33 of the offending instruction, and the f-line word contains the integer data register number. if the m68040fpsp unimplemented instruction exception handler is used, there can be some other cases in which an overflow is reported. in addition to normal overflow, the exponential instructions can generate results that catastrophically overflow the 16-bit exponent used for intermediate results. for these instructions (fetox, ftentox, ftwotox, fsinh, and fcosh), the intermediate result found in either fptemp or wbtemp fields of the floating-point state frame are invalid. if an inex2 or inex1 exceptional condition exists and the user inex exception handler is enabled, it is the responsibility of the user ovfl exception handler to look for this situation. the user ovfl exception handler examines the e3 bit of the floating-point state frame to exit from this exception handler. if the e3 bit is set, it must be cleared prior to restoring the floating-point frame through the frestore instruction. if the e3 bit is clear and the e1 bit is set, the floating-point state frame is discarded. the rte instruction must be executed to return to normal instruction flow. 9.7.5 underflow an underflow exception occurs when the intermediate result of an arithmetic operation is too small to be represented as a normalized number in a floating-point data register or memory using the selected rounding precision. an arithmetic operation is too small when the intermediate result exponent is less than or equal to the minimum exponent value of the selected rounding precision. underflow is not detected for intermediate result exponents that are equal to the extended-precision minimum exponent since the explicit integer part bit permits representation of normalized numbers with a minimum extended- precision exponent. underflow can only occur when the destination format is single, double, or extended precision. when the destination format is byte, word, or long word, the conversion underflows to zero without causing either an underflow or an operand error. at the end of any operation that could potentially underflow, the intermediate result is checked for underflow, rounded, and checked for overflow before it is stored at the destination. if an underflow occurs, the unfl bit is set in the fpsr exc byte. even if the intermediate result is large enough to be represented as an extended-precision number, an underflow can occur. the intermediate result is rounded to the selected precision, and the rounded result is stored in extended-precision format. if the magnitude of the intermediate result is too small to be represented in the selected rounding precision, an underflow occurs. the ieee 754 standard defines two causes of an underflow: 1) when the absolute value of the number is less than the minimum number that can be represented by a normalized number in a specific data format; 2) when loss of accuracy occurs while attempting to calculate such a number (a loss of accuracy also causes an inexact exception). the ieee 754 standard specifies that if the underflow exception is disabled, an underflow should only be signaled when both of these cases are satisfied (i.e., the result is too small to be represented with a given format and there is a loss of accuracy during calculation of the f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
9- 34 m68040 user? manual motorola final result). if the exception is enabled, the underflow should be signaled any time a very small result is produced, regardless of whether accuracy is lost in calculating it. the processor unfl bit in the fpsr aexc byte implements the ieee exception disabled definition since it is only set when a very small number is generated and accuracy has been lost when calculating that number. the unfl bit in the fpcr exc byte implements the ieee exception enabled definition since it is set any time a tiny number is generated. 9.7.5.1 maskable exception conditions. there are no conditions. 9.7.5.2 nonmaskable exception conditions. when the unfl bit of the fpsr is set, the processor always takes an exception regardless of whether or not the user unfl exception handler is enabled. if the destination is a floating-point data register, the register is not affected, and either a pre-instruction or a post-instruction exception is reported. if the destination is a memory or integer data register, then an undefined result is stored, and a post-instruction exception is taken immediately. exception processing begins with the m68040fpsp unfl exception handler. the m68040fpsp unfl exception handler stores the result in the destination as either a denormalized number or zero. shifting the mantissa of the intermediate result to the right while incrementing the exponent until it is equal to the denormalized exponent value for the destination format accomplishes denormalization. the denormalized intermediate result is rounded to the selected rounding precision if the destination is a floating-point data register or rounded to the destination format in the case of an fmove out instruction. for the instructions with forced rounding precision (e.g., fsadd and fdmul), the destination is rounded using the precision defined by the instruction. if in the process of denormalizing the intermediate result, all of the most significant bits are shifted off to the right, the selected rounding mode determines the value to be stored at the destination, table 9-13 lists these values. once the result is stored in the destination, the m68040fpsp unfl exception handler checks to see if the user unfl exception handler is enabled. table 9-13. underflow rounding mode values rounding mode result rn zero, with the sign of the intermediate result. rz zero, with the sign of the intermediate result. rm for positive overflow, + zero; for negative underflow, smallest denormalized negative number. rp for positive overflow, smallest denormalized positive number; for negative underflow, ?ero. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 9- 35 a. if the user unfl exception handler is disabled, the m68040fpsp unfl exception handler checks for an inex1 or inex2 exception condition with the user inex exception handler enabled. if not, the processor returns to normal instruction flow. otherwise, the m68040fpsp unfl exception handler restores the fpu to its exceptional state, cleans up the stack to the conditions prior to execution, and continues instruction execution at the user inex exception handler. no parameters are passed to the user inex exception handler since the m68040fpsp unfl exception handler provides the illusion that it never existed. otherwise, the m68040fpsp unfl exception handler returns the processor to normal processing. b. if the user unfl exception handler is enabled, the m68040fpsp unfl exception handler restores the fpu to its exceptional state, cleans up the stack to the conditions prior to execution, and continues instruction execution at the user unfl exception handler. once the m68040fpsp unfl exception handler recognizes the operand error as a maskable condition, it does not modify the destination or pass control to the user unfl exception handler. the user unfl exception handler must execute an fsave as its first floating-point instruction. at this point, the destination contains the rounding mode values listed in table 9-13, and the user unfl exception handler can choose to modify these values. the e3 and e1 bits of the floating-point state frame need to be examined to determine which fields on the floating-point state frame are valid. e3 always takes precedence and must always be serviced first. table 9-16 lists the floating-point state frame fields for ovfl exceptions with e3 set or with e3 clear and e1 set. it is possible for an fadd, fsub, fmul, and fdiv to report a post-instruction exception, although these instructions normally generate a pre-instruction exception. the following example illustrates why a post-instruction exception is generated. fadd fp2,fp0 ; this instruction generates an underflow exception fmove fp0, ; this instruction is executing when underflow occurs in this example, assume that the fmove instruction starts once the fadd instruction generates an underflow. given the register dependency on fp0, the destination of the fadd instruction, fp0 needs to be resolved prior to the fmove instruction execution. for this example, there is no choice but to have the fadd instruction report a post-instruction exception immediately. note that for this case, even though the t-bit of the floating-point state frame is set (post-instruction exception), it does not imply an fmove out instruction. therefore, the effective address field in the format $3 stack frame is invalid. the fmove out instruction generates a post-instruction exception. for this case, the effective address field in the format $3 stack frame points to the destination memory location. if the destination is an integer data register, the fpiar points to the f-line word of the offending instruction, and the f-line word contains the integer data register number. if the m68040fpsp unimplemented instruction exception handler is used, there can be some other cases in which an underflow is reported. if an inex2 or inex1 exceptional condition exists and the user inex exception handler is enabled, it is the responsibility of the user unfl exception handler to look for this situation. the user unfl exception handler examines the e3 bit of the floating-point state frame to exit from this exception handler. if the e3 bit is set, it must be cleared prior to restoring the floating-point frame through the frestore instruction. if the e3 bit is clear and the e1 bit f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
9- 36 m68040 user? manual motorola is set, the floating-point frame is discarded. the rte instruction must be executed to return to normal instruction flow. 9.7.6 divide by zero this exception happens when a zero divisor occurs for a divide instruction or when a transcendental function is asymptotic with infinity as the asymptote. table 9-14 lists the instructions that can cause the divide by zero exception. note that only the fdiv and fsgldiv instructions are native to the mc68040. the other conditions occur only if the m68040fpsp is used. when a divide by zero is detected, the dz bit is set in the fpsr exc byte. the divide by zero exception only has maskable exceptional conditions; therefore, no m68040fpsp intervention is needed. an exception is taken only if the dz bit is set in fpsr exc byte and the corresponding bit in the fpcr enable byte is set. a. if the user divide by zero exception handler is enabled, an infinity with the sign set to the exclusive or of the signs of the input operands is stored in the destination floating-point data register. no exception is taken. b. if the user divide by zero exception handler is disabled, the destination floating-point data register is not modified, and the exception is reported as a pre-instruction exception when the next floating-point instruction is attempted. the divide by zero entry in the processor? vector table points to the user divide by zero exception handler. table 9-14. possible divide by zero exceptions instruction operand value fdiv source operand = 0 and floating-point data register is not a nan flog10 source operand = 0 flog2 source operand = 0 flogn source operand = 0 ftan source operand is an odd multiple of p ? 2 fsgldiv source operand = 0 and floating-point data register is not a nan an fsave must be the first instruction of the user divide by zero exception handler. the user divide by zero exception handler must generate a result to store in the destination. to assist the exception handler in this function, the processor supplies the information listed in table 9-16, which lists the floating-point state frame fields for divide by zero exceptions that are defined for supervisor exception handler use. to exit the user divide by zero exception handler, the saved floating-point frame is discarded, and an rte returns the processor to normal processing. 9.7.7 inexact result the processor provides two inexact bits in the fpsr exc byte to help distinguish between inexact results generated by emulated decimal input (inex1 exceptions) and other inexact results (inex2 exceptions). these two bits are useful in instructions where both types of inexact results can occur (e.g., fdiv.p #7e-1,fp3). in this case, the packed decimal to extended-precision conversion of the immediate source operand causes an f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 9- 37 inexact error to occur that is signaled as inex1 exception. furthermore, the subsequent divide could also produce an inexact result and cause inex2 to be set in the fpcr exc byte . note that only one inexact exception vector number is generated by the processor. if either of the two inexact exceptions is enabled, the processor fetches the inexact exception vector, and the user inex exception handler is initiated. inex refers to both exceptions in the following paragraphs. the inex2 exception is the condition that exists when any operation, except the input of a packed decimal number, creates a floating-point intermediate result whose infinitely precise mantissa has too many significant bits to be represented exactly in the selected rounding precision or in the destination data format. if this condition occurs, the inex2 bit is set in the fpsr exc byte, and the infinitely precise result is rounded. table 9-15 lists these rounding mode values. table 9-15. divide by zero rounding mode values rounding mode result rn the representable value nearest to the infinitely precise intermediate value is the result. if the two nearest representable values are equally near (a tie), then the one with the least significant bit equal to zero (even) is the result. this is sometimes referred to as ?ound nearest, even. rz the result is the value closest to and no greater in magnitude than the infinitely precise intermediate result. this is sometimes referred to as the ?hip mode, since the effect is to clear the bits to the right of the rounding point. rm the result is the value closest to and no greater than the infinitely precise intermediate result (possibly minus infinity). rp the result is the value closest to and no less than the infinitely precise intermediate result (possibly plus infinity). the inex1 and inex2 exceptions are always maskable. therefore, any inex exception goes directly to the user inex exception handler. the m68040fpsp does not provide any special handling for the inex exception. when an inex2 or inex1 bit in the fpsr exc byte is set, the processor stores the rounded result (listed in table 9-15), to the destination. the fpcr mode byte determines the rounding mode, and the prec byte determines the rounding precision if the destination is a floating-point data register. otherwise, if the destination is memory or an integer data register, the destination format determines the rounding precision. if one of the instructions has a forced precision, the instruction determines the rounding precision. if the inex2 or inex1 condition exists and if the corresponding inex bit in the fpcr enable byte is set, then the user inex exception handler is taken. a. if the user inex exception handler is disabled, result is rounded and normal processing continues. b. if the user inex exception handler is enabled, the exception is taken. the inex entry in the processor? vector table points to the user inex exception handler. the user inex exception handler must execute an fsave as its first floating-point instruction. at this point, the destination contains the rounding mode values as listed in f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
9- 38 m68040 user? manual motorola table 9-15, and the user inex exception handler can choose to modify these values. the e3 and e1 of the floating-point state frame bits need to be examined to determine which fields in the floating-point state frame are valid. e3 always takes precedence and must always be serviced first. table 9-16 lists the floating-point state frame fields for inex exceptions with e3 set or with e3 clear and e1 set. it is possible for an fadd, fsub, fmul, and fdiv to report a post-instruction exception, although these instructions normally generate a pre-instruction exception. the following example shows why a post- instruction exception is generated. fadd fp2,fp0 ; this instruction generates an inexact exception fmove fp0, ; this instruction is executing when inexact occurs for this example, assume that the fmove instruction starts once the fadd instruction generates an underflow. given the register dependency on fp0, the destination of the fadd instruction, fp0 needs to be resolved prior to the fmove instruction execution. for this example, there is no choice but to have the fadd instruction report a post-instruction exception immediately. note that for this case, even though the t-bit of the floating-point state frame is set (post instruction exception), it does not imply an fmove out instruction. therefore, the effective address field in the format $3 stack frame is invalid. the fmove out instruction generates a post-instruction exception. for this case, the effective address field in the format $3 stack frame points to the destination memory location. if the destination is an integer data register, the fpiar points to the f-line word of the offending instruction, and the f-line word contains the integer data register number. if the mc68040fpsp unimplemented instruction exception handler is used, there can be some other cases in which an inexact exception is reported. the user inex exception handler examines the e3 bit of the floating-point state frame to exit from this exception handler. if the e3 bit is set, it must be cleared prior to restoring the floating-point frame via the frestore instruction. if the e3 bit is clear and the e1 bit is set, the floating-point frame is discarded. the rte instruction must be executed to return to normal instruction flow. note the ieee 754 standard specifies that inexactness should be signaled on overflow as well as for rounding. the processor implements this via the inex bit in the fpsr aexc byte. however, the standard also indicates that the inexact exception should be taken if an overflow occurs with the ovfl bit disabled and the inex bit enabled in the fpsr aexc byte. therefore, the processor takes the inexact exception if this combination of conditions occurs, even though the inex1 or inex2 bit may not be set in the fpsr exc byte. in this case, the inex bit is set in the fpsr aexc byte, and the ovfl bit is set in both the fpsr exc and aexc bytes. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 9- 39 9.8 floating-point state frames all floating-point arithmetic exception handlers must have fsave as the first floating-point instruction; any other floating-point instruction causes another exception to be reported. once the fsave instruction has executed, the exception handler should use only the fmovem instruction to read or write to the floating-point data registers since fmovem cannot generate further exceptions or change the fpcr. the fpu executes an fsave instruction to save the current floating-point internal state for context switches and floating-point exception handling. when an fsave is executed, the processor waits until the fpu either completes execution of all current instructions or is unable to perform further processing due to a pending exception that must be serviced. any exceptions generated during this time are not reported and are saved in the resulting busy state frame. four state frames can be generated as a result of an fsave instruction: busy, null, idle, and unimplemented floating-point instruction. when an unimplemented floating-point exception occurs, the fsave generates a 26-word unimplemented instruction state frame. when an unsupported data type exception occurs, the fsave generates a 50-word busy state frame. all floating-point arithmetic exceptions causes the fsave to generate either the 26-word unimplemented instruction state frame or the 50-word busy state frame. for a hardware reset or an frestore of a null state frame, the fsave instruction generates a null state frame. this null state frame is generated until the first nonconditional floating- point instruction is executed (conditionals include fnop, fbcc, fdbcc, fscc, and ftrapcc). floating-point conditional instructions do not set an internal flag, which changes the state frame from null to idle. if these instructions are the only ones executed after a reset or an frestore of a null state frame, then when fsave is executed, it stacks a null state frame instead of an idle state frame. note that this function is different from that of the mc68881 and mc68882, and software must be aware of this difference if compatibility with the mc68881 and mc68882 is desired. once a nonconditional floating- point instruction is executed, an fsave generates an idle state frame. the idle state frame is generated whenever the fpu has no exceptions pending. an idle state frame is saved if no exceptions are pending and at least one instruction has been executed since the last hardware reset or frestore of a null state frame. a 26-word unimplemented floating-point instruction state frame is saved if the last instruction was an unimplemented floating-point instruction. figure 9-10 illustrates each of these state frames, followed by definitions for each of the fields listed in alphabetical order. note the notation [xx?x] indicates the length of the field but does not indicate the field? actual location. [xx, xx?x] indicates that one bit of the field is located separately or termed differently from the other bits. this notation is for convenience of explanation only. for example, wbtm [65?4] indicates that wbtm is 32 bits long and gives a reference to each bit in wbtm without giving its actual location in the state frame. for the actual locations refer to figure 9-10. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
9- 40 m68040 user? manual motorola version = $41 $60 stag cmdreg1b dtag fpte fptm [63?2] e1 e3 t sbit wbtm [65?4] wbts wbte [14?0] cmdreg3b wbtm [33?2] fpiarcu fptm [31?0] ete ets etm [63?2] etm [31?0] $00 $04 $08 $18 $20 $24 $28 $1c $2c $30 $34 $38 $3c $40 $44 $48 $4c $50 $54 $58 $5c $60 $0c $10 $14 15 0 31 24 23 16 (a) busy fpu state frame fpts cu_savepc reserved e[15] m66 m1 m0 wbt wbt wbt wbt figure 9-10. mc68040 floating-point state frames (sheet 1 of 2) f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user?s manual 9- 41 (d) unimplemented instruction fpu state frame $00 $00 15 31 24 23 16 (c) idle fpu state frame (b) null fpu state frame $00 $00 15 31 24 23 16 (undefined) version number figure 9-10. mc68040 floating-point state frames (sheet 2 of 2) reserved = $41 0 0 version = $41 $30 stag cmdreg1b dtag fpte fptm [63e32] e1 e3 t sbit cmdreg3b fptm [31e00] ete ets etm [63e32] etm [31e00] fpts e[15] m[66] m1 m0 wbt wbt wbt wbt $04 $08 $18 $20 $24 $28 $1c $2c $30 $0c $10 $14 $00 15 0 31 24 23 16 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
9- 42 m68040 user? manual motorola cmdreg1b?his field contains the command word of the exceptional floating-point instruction for an e1 exception, which is an exception detected by the conversion unit (cu) in the floating-point pipeline (see figure 9-1). for fsqrt, cmdreg1b [6?] are mapped from $4 for the instruction to $5 in cmdreg1b. all other instructions map directly. cmdreg3b?his field contains the encoded instruction command word for an e3 exception, which is an exception detected by the write-back unit (wb) in the floating-point pipeline (see figure 9-1). figure 9-11 details the bit mapping between cmdreg1b and cmdreg3b. for fsqrt, bits cmdreg1b [6?] are changed from $4 for the instruction to $5 for cmdreg1b, and therefore map to $21 for cmdreg3b. cmdreg1b cmdreg3b opclass src
(rx) dst
(ry) cmd 15 13 12 10 9 7 6 0 dst
(ry) cmd 10 9 7 6 0 0 figure 9-11. mapping of command bits for cmdreg3b field cu_savepc?his field contains the pc for the fpu pipeline? conversion unit. e1?f set, this bit indicates that an exception has been detected by the conversion unit pipeline stage. all exception types are possible. the exception handler first checks for an e3 exception and processes it before checking and processing an e1 exception. the e1 exception is processed if the e1 bit is set. for the unimplemented instruction state frame, the source operand? unsupported data type is packed if the e1 bit is set. e3?f set, this bit indicates that an exception has been detected by the wb pipeline stage. only ovfl, unfl, and inex2 exceptions on opclass 010 or 000 (register to register and memory to register) for fadd, fsub, fmul, fdiv, fsqrt can occur. the exception handler must check for and process an e3 exception first. ets, ete, etm?ollectively, these fields are referred to as the etemp register and normally contain the source operand converted to extended precision. if the instruction specifies a packed decimal real source, bits 63? of the operand reside in etm [63?0], and the ets and ete fields are undefined. fpiarcu?his field contains the instruction address register for the fpu pipeline? conversion unit. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 9- 43 fpts, fpte, fptm?ollectively, these fields are referred to as the fptemp register and normally contain the destination operand for dyadic operations converted to extended precision. if the instruction specifies a packed decimal real source, bits 95?4 of the operand reside in fptm [31?0], and the fpts, fpte, and fptm [63?2] fields are undefined. opclass?his field refers to bits 15?3 of cmdreg1b. note that cmdreg1b is identical to the second word of a floating-point arithmetic instruction opcode. stag, dtag?hese 3-bit fields specify the data type of the source and destination operands, respectively. stag is undefined for a packed decimal real source operand. the encodings for stag and dtag are as follows: 000 = normalized 001 = zero 010 = infinity 011 = nan 100 = extended-precision denormalized or unnormalized input 101 = single- or double-precision denormalized input t?f set, this bit indicates that a post-instruction exception has occurred. since only an opclass 3 instruction can indicate a post-instruction exception, this bit indicates that the exception is caused by an fmove out instruction. wbts, wbte [15,14?0], wbtm [66,65?2,01,00], sbit?hese fields contain the exception operand in internal data format for e3 exceptions. collectively, these fields are called the wbtemp and are an image of the intermediate result. wbtm66 is the overflow bit; wbtm1, wbtm0, and sbit are the guard, round, and sticky bits, respectively. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
9- 44 m68040 user? manual motorola table 9-16. state frame field information fsave state frame field contents unimplemented instruction exceptions (for opclass 000 and 010) cmdreg1b exception instruction command word etemp source operand is converted to extended precision. if format is packed, etm [63?] contains bits 63? of the packed decimal operand. stag source operand tag (undefined if format is packed). fptemp destination operand, if any, is converted to extended precision. if format is packed, fptm [31?] contains bits 95?4 of the packed decimal operand. dtag destination operand tag, if any. e1 always 1 t always 0 unsupported data type (for opclass 000 and 010) cmdreg1b exception instruction command word etemp source operand is converted to extended precision. if format is packed, etm [63?] contains bits 63? of the packed decimal operand. stag source operand tag (undefined if format is packed). fptemp destination operand, if any, is converted to extended precision. if format is packed, fptm [31?] contains bits 95?4 of the packed decimal operand. dtag destination operand tag, if any. e1 always 1 t always 0 unsupported data type (for opclass 011) cmdreg1b fmove command word etemp unrounded source operand from floating-point data register stag source operand tag e1 always 1 t always 1 snan (for opclass 000 and 010) cmdreg1b exception instruction command word etemp source operand is converted to extended precision. stag source operand tag fptemp destination operand, if any, is converted to extended precision. dtag destination operand tag, if any. e1 always 1 t always 0 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 9- 45 table 9-16. state frame field information (continued) fsave state frame field contents snan (for opclass 011) cmdreg1b fmove instruction command word etemp unrounded source operand from floating-point register, with snan bit set. stag source operand tag, indicated nan. e1 always 1 t always 1 operr (for opclass 000 and 010) cmdreg1b exception instruction command word etemp source operand is converted to extended precision. stag source operand tag fptemp destination operand, if any, is converted to extended precision. dtag destination operand tag, if any. e1 always 1 t always 0 operr (for opclass 011) cmdreg1b fmove instruction command word etemp unrounded source operand from floating-point register stag source operand tag wbtemp contains the rounded integer used to check for erroneous integer overflow. e1 always 1 t always 1 ovfl (fmove to register, fabs, and fneg) cmdreg1b exception instruction command word fptemp intermediate result with mantissa rounded to correct precision. stag source operand tag = normalized e1 always 1 t always 0 ovfl (fadd, fsub, fmul, fdiv, and fsqrt) cmdreg3b encoded exception instruction command word wbtemp wbts, wbte, and wbtm equal the intermediate result with mantissa rounded to the correct precision. wbte15 bit 15 of the intermediate result's 16-bit exponent = 0 for overflow. e3 always 1 t either 1 or 0 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
9- 46 m68040 user? manual motorola table 9-16. state frame field information (continued) fsave state frame field contents ovfl (fmove to memory) cmdreg1b fmove instruction command word fptemp intermediate result with mantissa rounded to correct precision. stag source operand tag = normalized e1 always 1 t always 1 unfl (fmove to register, fabs, and fneg) cmdreg1b exception instruction command word fptemp unrounded, extended-precision intermediate result stag source operand tag = normalized e1 always 1 t always 0 unfl (fadd, fsub, fmul, fdiv, and fsqrt) cmdreg3b encoded exception instruction command word wbtemp wbts, wbte, and wbtm = intermediate result sign, biased 15-bit exponent, and 64-bit mantissa prior to rounding. wbte15 bit 15 of the intermediate result's 16-bit exponent = 1 for underflow. wbtm1, wbtm0, sbit guard, round, and sticky of intermediate result? 67-bit mantissa. e3 always 1 t either 1 or 0 unfl (fmove to memory) cmdreg1b fmove instruction command word fptemp intermediate result with mantissa prior to rounding. stag source operand tag = normalized e1 always 1 t always 1 dz cmdreg1b m68040fpsp divide by zero can generate. etemp source operand is converted to extended precision. stag source operand tag fptemp destination operand is converted to extended precision. e1 always 1 t always 0 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 9- 47 table 9-16. state frame field information (concluded) fsave state frame field contents inex (fmove to register, fabs, and fneg) cmdreg1b exception instruction command word fptemp unrounded, extended-precision intermediate result stag source operand tag = normalized e1 always 1 t always 0 inex (fadd, fsub, fmul, fdiv, and fsqrt) cmdreg3b encoded exception instruction command word wbtemp wbts, wbte, and wbtm = intermediate result sign, biased 15-bit exponent, and 64-bit mantissa prior to rounding. wbte15 either 1 or 0, generally useless for inex exceptions. wbtm1, wbtm0, sbit guard, round, and sticky of intermediate result? 67-bit mantissa. e3 always 1 t either 1 or 0 inex (fmove to memory) cmdreg1b fmove instruction command word fptemp intermediate result with mantissa prior to rounding. stag source operand tag = normalized e1 always 1 t always 1 note: if the m68040fpsp unimplemented exception handler is used, the above state frame information applies. the cmdreg1b or cmdreg3b fields of the state frame are modified as appropriate to encode the unimplemented instruction opcode. it is the user exception handler? responsibility to use the e3 and e1 field encodings to recognize which state frame information applies. when e3 = 1 and e1 = 1, e3 takes priority and the state frame information for e3 = 1 must be used. f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 10-1 section 10 instruction timings this section summarizes instruction timings for the m68040. the timings are divided into two groups: integer unit and floating-point unit instruction timings. each group is further subdivided to separate more complex instruction timings. each of these subdivided groups is in alphabetical order with no reference to mode. table 10-1 alphabetically lists instruction timings and their location in this section. table 10.1. instruction timing index instruction page instruction page instruction page abcd 10-11 bra 10-11 eor 10-13 add 10-13 bset 10-15 eori 10-13 adda 10-13 bsr 10-11 eori #,ccr 10-11 addi 10-13 btst 10-17 eori #,sr 10-11 addq 10-14 cas 10-17 exg 10-11 addx 10-11 cas2 10-11 ext 10-11 and 10-13 chk , dn 10-17 extb 10-11 andi 10-13 chk2 , rn 10-18 fabs 10-30,36 andi #,ccr 10-11 clr 10-18 fadd 10-30,35 andi #,sr 10-11 cinv 10-8 fbcc 10-29 asl 10-14 cmp 10-18 fcmp 10-30,37 asr 10-14 cmp2 10-19 fdbcc 10-29 bcc 10-11 cmpa.l 10-19 fdiv 10-30,35 bchg 10-15 cmpi 10-19 fmove 10-30,36 bclr 10-15 cmpm 10-11 fmove fpn, 10-31 bfchg 10-15 cpush 10-8 fmove/fmovem to/from cr 10-32 bfclr 10-15 dbcc 10-11 fmovem 10-37 bfexts 10-15 divs.l 10-20 fmovem , 10-32 bfextu 10-15 divs.w 10-20 fmovem , 10-32 bfffo 10-16 divsl.l 10-20 fmul 10-30,35 bfins 10-16 divu.l 10-20 fneg 10-30,36 bfset 10-15 divu.w 10-20 fnop 10-29 bftst 10-16 divul.l 10-20 frestore 10-34 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
10-2 m68040 user? manual motorola table 10.1. instruction timing index (continued) instruction page instruction page instruction page fsave 10-33 movep 10-11 rol 10-26 fscc 10-32 moveq 10-11 ror 10-26 fsqrt 10-30,36 moves ,an 10-24 roxl 10-27 fsub 10-30,35 moves ,dn 10-24 roxr 10-27 ftrapcc 10-29 moves rn, 10-24 rtd 10-11 ftst , fpn 10-30 muls.w/l 10-25 rte 10-11 illegal 10-11 mulu.w/l 10-25 rtr 10-11 jmp 10-20 nbcd 10-25 rts 10-11 jsr 10-21 neg 10-26 sbcd 10-11 lea 10-21 negx 10-26 scc 10-27 link 10-11 nop 10-11 sub 10-13 lsl 10-14 not 10-26 suba 10-27 lsr 10-14 or 10-13 subi 10-13 move 10-9,10 ori 10-13 subq 10-14 move from ccr 10-21 ori #,ccr 10-11 subx 10-11 move from sr 10-22 ori #,sr 10-11 swap 10-11 move to ccr 10-22 pack 10-11 tas 10-28 move to sr 10-22 pea 10-26 trap# 10-11 move usp 10-11 pflush 10-11 trapcc 10-11 move16 10-11 pflusha 10-11 trapv 10-11 movea.l 10-23 pflushan 10-11 tst 10-13 movec 10-11 pflushn (an) 10-11 unlk 10-11 movem , 10-23 ptestr, ptestw 10-11 unpk 10-11 movem.l , 10-23 reset 10-11 f r e e s c a l e s e m i c o n d u c t o r , i freescale semiconductor, inc. f o r m o r e i n f o r m a t i o n o n t h i s p r o d u c t , g o t o : w w w . f r e e s c a l e . c o m n c . . .
motorola m68040 user? manual 10-3 10.1 overview refer to section 2 integer unit for information on the integer unit pipeline. the fetch timing is not listed in the following tables because most instructions require one clock in the fetch stage for each memory access to obtain an operand. an instruction requires one clock to pass through the fetch stage even if no operand is fetched. table 10-2 summarizes the number of memory fetches required to access an operand using each addressing mode for long-word aligned accesses. the user must perform his own calculations for