|
If you can't view the Datasheet, Please click here to try to view without PDF Reader . |
|
Datasheet File OCR Text: |
g522-0291-00 mpcbusif/ad 3/97 rev. 0 powerpc microprocessor family: the bus interface for 32-bit microprocessors
?motorola inc. 1997. all rights reserved. portions hereof ?international business machines corp. 1991?997. all rights reserved. this document contains information on a new product under development by motorola and ibm. motorola and ibm reserve the right t o change or discontinue this product without notice. information in this document is provided solely to enable system and software implemen ters to use powerpc microprocessors. there are no express or implied copyright or patent licenses granted hereunder by motorola or ibm to design, m odify the design of, or fabricate circuits based on the information in this document. the powerpc microprocessor embodies the intellectual property of motorola and of ibm. however, neither motorola nor ibm assumes any responsibility or liability as to any aspects of the performance, operation, or other attributes of the microprocessor as marketed by the othe r party or by any third party. neither motorola nor ibm is to be considered an agent or representative of the other, and neither has assumed, created, or gran ted hereby any right or authority to the other, or to any third party, to assume or create any express or implied obligations on its behalf. informatio n such as errata sheets and data sheets, as well as sales terms and conditions such as prices, schedules, and support, for the product may vary as between parties selling the product. accordingly, customers wishing to learn more information about the products as marketed by a given party should contact that pa rty. both motorola and ibm reserve the right to modify this document and/or any of the products as described herein without further notice. nothing in this document, nor in any of the errata sheets, data sheets, and other supporting documentation, shall be interpreted as the conveyance by motorola or ibm of an express warranty of any kind or implied warranty, representation, or guarantee regarding the merchantability or fitness of the products for any particular purpose . neither motorola nor ibm assumes any liability or obligation for damages of any kind arising out of the application or use of these materials. any warranty or other obligations as to the products described herein shall be undertaken solely by the marketing party to the customer, under a separate sale agreement between the marketing party and the customer. in the absence of such an agreement, no liability is assumed by mo torola, ibm, or the marketing party for any damages, actual or otherwise. ?ypical parameters can and do vary in different applications. all operating parameters, including ?ypicals, must be validat ed for each customer application by customer s technical experts. neither motorola nor ibm convey any license under their respective intellectual pr operty rights nor the rights of others. neither motorola nor ibm makes any claim, warranty, or representation, express or implied, that the products describ ed in this document are designed, intended, or authorized for use as components in systems intended for surgical implant into the body, or other applic ations intended to support or sustain life, or for any other application in which the failure of the product could create a situation where personal injur y or death may occur. should customer purchase or use the products for any such unintended or unauthorized application, customer shall indemnify and hold mo torola and ibm and their respective of?ers, employees, subsidiaries, af?iates, and distributors harmless against all claims, costs, damages, and expenses, and reasonable attorney s fees arising out of, directly or indirectly, any claim of personal injury or death associated with such unintended o r unauthorized use, even if such claim alleges that motorola or ibm was negligent regarding the design or manufacture of the part. motorola and are registered trademarks of motorola, inc. motorola, inc. is an equal opportunity/af?mative action employer. ibm, the ibm logo, and ibm microelectronics are trademarks of international business machines corporation. the powerpc name, the powerpc logotype, powerpc 601, powerpc 602, powerpc 603, powerpc 603e, powerpc 604, powerpc 604e, and powerpc 620 are trademarks of international business machines corporation used by motorola under license from international bus iness machines corporation. international business machines corporation is an equal opportunity/af?mative action employer. overview signal descriptions memory access protocol memory coherency system status signals additional bus con?urations direct-store interface system considerations processor summary processor clocking overview processor upgrade suggestions l2 considerations for the powerpc 604 processor coherency action tables glossary of terms and abbreviations index 1 2 3 4 5 6 8 7 a b c d e ind glo overview signal descriptions memory access protocol memory coherency system status signals additional bus con?urations direct-store interface system considerations processor summary processor clocking overview processor upgrade suggestions l2 considerations for the powerpc 604 processor coherency action tables glossary of terms and abbreviations index a b c d e 2 3 4 5 6 1 8 7 ind glo contents iii contents paragraph number title page number about this document audience .............................................................................................................. xvi organization......................................................................................................... xvi suggested reading.............................................................................................. xvii conventions ...........................................................................................................xx acronyms and abbreviations .............................................................................. xxi chapter 1 overview 1.1 powerpc 60x microprocessor interface .............................................................. 1-1 1.2 powerpc system block diagram ........................................................................ 1-3 1.3 processor bus features ........................................................................................ 1-3 1.4 bus interface signals ........................................................................................... 1-4 chapter 2 signal descriptions 2.1 address bus arbitration signals.......................................................................... 2-2 2.1.1 bus request ( br )?utput ............................................................................. 2-2 2.1.2 bus grant ( bg )?nput ................................................................................... 2-2 2.1.3 address bus busy ( abb )?utput ................................................................. 2-3 2.1.4 address bus busy ( abb )?nput.................................................................... 2-4 2.2 address transfer start signals............................................................................. 2-4 2.2.1 transfer start ( ts )?utput ............................................................................ 2-4 2.2.2 transfer start ( ts )?nput............................................................................... 2-5 2.2.3 extended address transfer start ( xats )?utput (direct-store)................. 2-5 2.2.4 extended address transfer start ( xats )?nput (direct-store) ................... 2-5 2.3 address transfer signals ..................................................................................... 2-6 2.3.1 address bus (a[0?1])?utput (memory operations)................................. 2-6 2.3.2 address bus (a[0?1])?nput (memory operations) ................................... 2-6 2.3.3 address bus (a[0?1])?utput (direct-store operations)........................... 2-6 2.3.4 address bus (a[0?1])?nput (direct-store operations) ............................. 2-7 2.3.5 address bus parity (ap[0?])?utput .......................................................... 2-7 iv powerpc microprocessor family: the bus interface for 32-bit microprocessors contents paragraph number title page number 2.3.6 address bus parity (ap[0?])?nput .............................................................2-7 2.3.7 address parity error ( ape )?utput ..............................................................2-8 2.4 address transfer attribute signals ......................................................................2-8 2.4.1 transfer type (tt[0?])?utput ...................................................................2-8 2.4.2 transfer type (tt[0?])?nput......................................................................2-9 2.4.3 transfer burst ( tbst )?utput.....................................................................2-10 2.4.4 transfer burst ( tbst )?nput .......................................................................2-10 2.4.5 transfer size (tsiz[0?])?utput...............................................................2-10 2.4.6 transfer size (tsiz[0?])?nput .................................................................2-11 2.4.7 transfer code (tc n )?utput .......................................................................2-11 2.4.8 cache inhibit ( ci )?utput ...........................................................................2-15 2.4.9 write-through ( wt )?utput.......................................................................2-16 2.4.10 global ( gbl )?utput ..................................................................................2-16 2.4.11 global ( gbl )?nput .....................................................................................2-16 2.4.12 cache set element (cse n )?utput..............................................................2-17 2.4.13 high-priority snoop request ( hp_snp_req )?01 only ...........................2-17 2.5 address transfer termination signals ...............................................................2-17 2.5.1 address acknowledge ( aack )?nput........................................................2-17 2.5.2 address retry ( artry )?utput.................................................................2-18 2.5.3 address retry ( artry )?nput ...................................................................2-19 2.5.4 shared ( shd )?utput..................................................................................2-19 2.5.5 shared ( shd )?nput ....................................................................................2-19 2.6 data bus arbitration signals..............................................................................2-20 2.6.1 data bus grant ( dbg )?nput.......................................................................2-20 2.6.2 data bus write only ( dbwo )?nput .........................................................2-21 2.6.3 data bus busy ( dbb )?utput .....................................................................2-21 2.6.4 data bus busy ( dbb )?nput ........................................................................2-22 2.7 data transfer signals .........................................................................................2-22 2.7.1 data bus (dh[0?1], dl[0?1])?utput ....................................................2-22 2.7.2 data bus (dh[0?1], dl[0?1])?nput.......................................................2-23 2.7.3 data bus parity (dp[0?])?utput ..............................................................2-23 2.7.4 data bus parity (dp[0?])?nput.................................................................2-24 2.7.5 data parity error ( dpe )?utput..................................................................2-24 2.7.6 data bus disable ( dbdis )?nput ................................................................2-24 2.8 data transfer termination signals.....................................................................2-25 2.8.1 transfer acknowledge ( ta )?nput..............................................................2-25 2.8.2 data retry ( drtry )?nput .........................................................................2-25 2.8.3 transfer error acknowledge ( tea )?nput..................................................2-26 2.9 system status signals.........................................................................................2-27 2.9.1 interrupt ( int )?nput...................................................................................2-27 2.9.2 system management interrupt ( smi )?nput ................................................2-27 2.9.3 machine check interrupt ( mcp )?nput.......................................................2-28 2.9.4 checkstop input ( ckstp_in )?nput ..........................................................2-28 contents v contents paragraph number title page number 2.9.5 checkstop output ( ckstp_out )?utput.................................................2-28 2.9.6 hard reset ( hreset )?nput ......................................................................2-29 2.9.7 soft reset ( sreset )?nput.........................................................................2-29 2.10 processor state signals.......................................................................................2-29 2.10.1 reservation ( rsrv )?utput........................................................................2-29 2.10.2 external cache intervention (l2_int)?nput .............................................2-30 2.10.3 time base enable (tben)?nput................................................................2-30 2.10.4 tlbi synchronization ( tlbisync )?nput ................................................2-30 2.11 power management signals ...............................................................................2-31 2.11.1 quiescent request (quiesc_req)?utput...............................................2-31 2.11.2 system quiesced ( sys_quiesc )?nput ....................................................2-31 2.11.3 resume (resume)?nput...........................................................................2-31 2.11.4 quiescent request ( qreq )?utput............................................................2-32 2.11.5 quiescent acknowledge ( qack )?nput ....................................................2-32 2.11.6 halted (halted)?utput ..........................................................................2-32 2.11.7 run (run)?nput.........................................................................................2-32 2.11.7.1 going from normal to doze state (604e)..................................................2-33 2.11.7.2 going from doze to nap state...................................................................2-33 2.11.7.3 going from nap to doze state...................................................................2-34 2.12 summary of signal differences .........................................................................2-34 chapter 3 memory access protocol 3.1 bus protocol .........................................................................................................3-2 3.1.1 arbitration signals ...........................................................................................3-4 3.1.2 address pipelining and split-bus transactions...............................................3-5 3.2 address bus tenure .............................................................................................3-6 3.2.1 address bus arbitration...................................................................................3-6 3.2.2 address transfer ..............................................................................................3-8 3.2.2.1 address bus parity.......................................................................................3-9 3.2.2.2 address transfer attribute signals..............................................................3-9 3.2.2.2.1 transfer type (tt[0?]) signals.............................................................3-9 3.2.2.2.2 transfer size (tsiz[0?]) signals...........................................................3-9 3.2.2.3 burst ordering during data transfers .......................................................3-10 3.2.2.4 effect of alignment in data transfers.......................................................3-10 3.2.2.4.1 alignment of external control instructions...........................................3-17 3.2.3 address transfer termination .......................................................................3-17 3.3 data bus tenure.................................................................................................3-19 3.3.1 data bus arbitration ......................................................................................3-19 3.3.1.1 effect of artry assertion on data transfer and arbitration on the powerpc 604 processor .........................................................................3-20 vi powerpc microprocessor family: the bus interface for 32-bit microprocessors contents paragraph number title page number 3.3.1.2 using the dbb signal ................................................................................3-21 3.3.2 data bus write only ......................................................................................3-22 3.3.3 data transfer ..................................................................................................3-22 3.3.4 data transfer termination .............................................................................3-23 3.3.4.1 normal single-beat termination ...............................................................3-24 3.3.4.2 data transfer termination due to a bus error ..........................................3-26 3.4 timing examples................................................................................................3-28 chapter 4 memory coherency 4.1 overview of cache implementations ...................................................................4-1 4.1.1 powerpc 601 processor cache organization...................................................4-2 4.1.2 powerpc 603 processor cache organization...................................................4-3 4.1.3 powerpc 603e processor cache enhancements ..............................................4-3 4.1.4 powerpc 604 processor cache organization...................................................4-4 4.1.5 powerpc 604e processor cache enhancements ..............................................4-5 4.2 cache coherency overview .................................................................................4-5 4.3 memory coherency?esi protocol ..................................................................4-6 4.4 coherency timing ................................................................................................4-9 4.5 coherency protocol ..............................................................................................4-9 4.5.1 powerpc 603 processor lwarx/stwcx. implementation.................................4-11 4.5.2 cache set element signals.............................................................................4-11 4.5.3 address retry sources ...................................................................................4-11 4.6 memory coherency actions?owerpc 60x processor-initiated operations...4-12 4.6.1 cache control instructions .............................................................................4-12 4.6.2 tlb invalidate entry instruction processing .................................................4-14 4.6.2.1 tlbie bus operation ................................................................................4-14 4.7 descriptions of bus transactions and snoop responses ...................................4-14 4.7.1 general comments on 60x snooping.............................................................4-14 4.7.2 clean block ....................................................................................................4-15 4.7.3 flush block.....................................................................................................4-15 4.7.4 write with flush, write with flush atomic...................................................4-15 4.7.5 kill block .......................................................................................................4-15 4.7.6 write with kill................................................................................................4-16 4.7.7 read, read atomic.........................................................................................4-16 4.7.8 read with intent to modify (rwitm)...........................................................4-16 4.7.9 tlb invalidate................................................................................................4-16 4.7.10 sync .............................................................................................................4-17 4.7.11 tlbsync......................................................................................................4-17 4.7.12 eieio..............................................................................................................4-17 4.7.13 icbi ................................................................................................................4-18 contents vii contents paragraph number title page number 4.7.14 read with no intent to cache (rwnitc).....................................................4-18 4.7.15 xferdata ..................................................................................................4-18 4.8 external wim bit settings.................................................................................4-19 4.9 direct-memory access and memory coherency...............................................4-19 4.10 overview of implementation differences..........................................................4-19 chapter 5 system status signals 5.1 overview ..............................................................................................................5-1 5.2 resets ...................................................................................................................5-2 5.2.1 hard reset and power-on reset......................................................................5-3 5.2.1.1 hard reset settings ......................................................................................5-3 5.2.2 soft reset .........................................................................................................5-5 5.2.2.1 system reset exception (0x00100) .............................................................5-5 5.2.2.2 soft reset on the powerpc 601 microprocessor .........................................5-6 5.2.2.3 soft reset on the powerpc 603 microprocessor .........................................5-7 5.2.2.4 soft reset on the powerpc 604 microprocessor .........................................5-7 5.3 machine check and checkstops ..........................................................................5-7 5.3.1 checkstop state (msr[me] = 0).....................................................................5-7 5.3.2 machine check exception (0x00200)..............................................................5-8 5.3.2.1 machine check exception (0x00200) powerpc 601 processor ...........................................................................5-9 5.3.2.2 checkstop state (msr[me] = 0)?owerpc 601 processor ....................5-10 5.3.2.2.1 checkstop sources and enables register?id0 .................................5-10 5.3.2.3 machine check exception?owerpc 603 processor...............................5-12 5.3.2.4 checkstop state (msr[me] = 0)?owerpc 603 processor ....................5-13 5.3.2.5 machine check exception?owerpc 604 processor...............................5-13 5.3.2.5.1 machine check exception enabled (msr[me] = 1) ............................5-14 5.3.2.5.2 checkstop state (msr[me] = 0)...........................................................5-14 5.4 external interrupt exception (0x00500) ............................................................5-14 5.4.1 external interrupt?owerpc 601 processor.................................................5-15 5.4.2 external interrupt?owerpc 603 processor.................................................5-16 5.5 system management interrupt exception (0x01400) ........................................5-16 chapter 6 additional bus configurations 6.1 no- drtry mode (603 and 604e).......................................................................6-1 6.1.1 no-drtry mode in powerpc 604e processor ..............................................6-2 6.2 data streaming mode (604).................................................................................6-3 6.2.1 data valid window in the data streaming mode ...........................................6-3 viii powerpc microprocessor family: the bus interface for 32-bit microprocessors contents paragraph number title page number 6.2.2 data valid window in the data streaming mode............................................6-3 6.2.3 design practices for data streaming mode .....................................................6-4 6.3 32-bit data bus mode (603) ................................................................................6-4 6.4 reduced-pinout mode (603) ................................................................................6-6 chapter 7 direct-store interface 7.1 direct-store transaction protocol details............................................................7-2 7.1.1 packet 0 ............................................................................................................7-3 7.1.2 packet 1 ............................................................................................................7-4 7.1.3 i/o reply operations........................................................................................7-4 7.2 direct-store operations........................................................................................7-6 7.3 store operations ...................................................................................................7-7 7.4 load operations ...................................................................................................7-7 7.5 direct-store operation timing.............................................................................7-8 7.6 memory-forced direct-store interface (powerpc 601 processor only)........................................................................7-9 chapter 8 system considerations 8.1 arbitration ............................................................................................................8-1 8.2 using the data bus write-only mechanism........................................................8-1 8.3 aack generation ................................................................................................8-4 8.4 sync vs. tlbsync and system design...........................................................8-4 8.5 pull-up resistors..................................................................................................8-5 8.6 features for improved bus performance..............................................................8-5 8.7 ieee 1149.1-compliant interface ........................................................................8-5 8.7.1 ieee 1149.1 interface description...................................................................8-5 8.8 lwarx/stwcx. considerations................................................................................8-6 8.8.1 coherency participation ...................................................................................8-6 8.8.1.1 noncacheable reservations..........................................................................8-6 8.8.1.2 cacheable reservations................................................................................8-7 8.8.1.3 read snooping requirements ......................................................................8-7 8.8.1.4 write-back reservation-canceling snoops .................................................8-7 8.8.1.5 write-through reservation-canceling snoops ...........................................8-8 8.8.1.6 noncanceling bus operations ......................................................................8-8 8.8.2 filtering options for reservations ...................................................................8-8 8.8.2.1 minimal reservation support ......................................................................8-8 8.8.2.2 improved reservation snooping ..................................................................8-9 contents ix contents paragraph number title page number 8.8.2.3 lwarx / stwcx. address-only operation......................................................8-10 8.8.2.4 software implications ................................................................................8-10 appendix a processor summary appendix b processor clocking overview b.1 powerpc 601 microprocessor clocking ............................................................. b-1 b.2 powerpc 603 and powerpc 604 microprocessor clocking ............................... b-2 appendix c processor upgrade suggestions c.1 powerpc 601 processor upgrade to 60x ............................................................ c-1 c.2 powerpc 603 processor upgrade to 604 or 60x ................................................. c-1 c.3 powerpc 604 processor upgrade to 60x ............................................................ c-3 appendix d l2 considerations for the powerpc 604 processor d.1 unfiltered snooping ............................................................................................ d-2 d.2 keeping a copy of l1 tags ................................................................................ d-2 d.2.1 requirements for saving state information.................................................... d-3 d.2.2 operations required for processor bus operations........................................ d-3 d.2.3 forwarding system bus operations to the processor ..................................... d-4 d.3 maintaining l1 state and tags ........................................................................... d-4 d.3.1 requirements for saving state information.................................................... d-5 d.3.2 operations required for processor bus operations........................................ d-5 d.3.3 forwarding system bus operations to the processor ..................................... d-6 d.4 simple l1 inclusion ............................................................................................ d-6 d.4.1 requirements for saving state information.................................................... d-6 d.4.2 operations required for processor bus operations........................................ d-6 d.4.3 forwarding system bus operations to the processor ..................................... d-7 d.5 marked l1 inclusion ........................................................................................... d-7 d.5.1 requirements for saving state information.................................................... d-7 d.5.2 operations required for processor bus operations........................................ d-8 d.5.3 forwarding system bus operations to the processor ..................................... d-8 x powerpc microprocessor family: the bus interface for 32-bit microprocessors contents paragraph number title page number appendix e coherency action tables e.1 load operations .................................................................................................. e-2 e.2 store operations .................................................................................................. e-5 e.3 lwarx operations ............................................................................................ e-8 e.4 stwcx operations........................................................................................... e-11 e.5 dcbt operations .............................................................................................. e-17 e.6 dcbtst operations ......................................................................................... e-20 e.7 dcbz operations .............................................................................................. e-21 e.8 dcbst operations............................................................................................ e-23 e.9 dcbf operations .............................................................................................. e-27 e.10 dcbi operations ............................................................................................... e-31 e.11 icbi operations................................................................................................. e-34 e.12 sync operations .............................................................................................. e-36 e.13 eieio operations .............................................................................................. e-37 e.14 tlbie operations ............................................................................................. e-37 e.15 tlbsync operations ...................................................................................... e-37 e.16 snoop-kill operations....................................................................................... e-38 e.17 snoop-read operations..................................................................................... e-39 e.18 snoop-read-atomic operations........................................................................ e-40 e.19 snoop-rwitm operations ............................................................................... e-41 e.20 snoop-rwitm-atomic operations .................................................................. e-41 e.21 snoop-flush operations .................................................................................... e-42 e.22 snoop-clean operations.................................................................................... e-42 e.23 snoop-write-with-flush operations ................................................................. e-43 e.24 snoop-write-with-kill operations .................................................................... e-44 e.25 snoop-write-with-flush-atomic operations .................................................... e-45 e.26 snoop-tlb-invalidate operations .................................................................... e-46 e.27 snoop-sync operations .................................................................................. e-46 e.28 snoop-eieio operations................................................................................... e-46 e.29 snoop-tlbsync operations........................................................................... e-47 e.30 snoop-icbi operations ..................................................................................... e-47 e.31 snoop-rwnitc operations ............................................................................. e-48 glossary of terms and abbreviations index illustrations xi illustrations figure number title page number 1-1 typical system diagram with processor bus...................................................... 1-3 1-2 processor bus signals .......................................................................................... 1-4 3-1 timing diagram legend...................................................................................... 3-2 3-2 overlapping tenures on the processor bus for a single-beat transfer .............. 3-3 3-3 address bus arbitration showing qualified bus grant...................................... 3-6 3-4 address bus arbitration showing bus parking................................................... 3-7 3-5 address bus transfer........................................................................................... 3-8 3-6 snooped address cycle with ar tr y ............................................................... 3-18 3-7 data bus arbitration .......................................................................................... 3-19 3-8 qualified dbg generation following ar tr y ................................................. 3-21 3-9 normal single-beat read termination ............................................................. 3-24 3-10 normal single-beat write termination............................................................. 3-24 3-11 normal burst transaction.................................................................................. 3-25 3-12 termination with dr tr y .................................................................................. 3-25 3-13 read burst with t a wait states and dr tr y ................................................... 3-26 3-14 fastest single-beat reads.................................................................................. 3-28 3-15 fastest single-beat writes................................................................................. 3-29 3-16 single-beat reads showing data-delay controls ............................................ 3-30 3-17 single-beat writes showing data delay controls............................................ 3-31 3-18 burst transfers with data delay controls......................................................... 3-32 3-19 use of transfer error acknowledge (tea ) ...................................................... 3-33 4-1 powerpc 601 processor cache organization ...................................................... 4-2 4-2 powerpc 603 processor cache organization ...................................................... 4-3 4-3 powerpc 604 processor cache organization ...................................................... 4-4 4-4 powerpc 604e processor cache organization .................................................... 4-5 4-5 mesi states ......................................................................................................... 4-7 4-6 mesi cache coherency protocol (601/604)?tate diagram (wim = 001)...... 4-8 4-7 mei cache coherency protocol (603)?tate diagram (wim = 001) ............. 4-10 4-8 effective address bits in bus address.............................................................. 4-17 5-1 hid0?heckstop sources and enables register (601) ................................... 5-10 6-1 data transfer in data streaming mode ............................................................... 6-3 6-2 32-bit data bus transfer (eight-beat burst) ...................................................... 6-5 6-3 32-bit data bus transfer (two-beat burst with drtry) ................................. 6-6 7-1 direct-store interface protocol tenures .............................................................. 7-2 7-2 direct-store operation?acket 0 ....................................................................... 7-3 7-3 direct-store operation?acket 1 ....................................................................... 7-4 xii powerpc microprocessor family: the bus interface for 32-bit microprocessors illustrations figure number title page number 7-4 i/o reply operation............................................................................................. 7-4 7-5 direct-store interface load access example...................................................... 7-8 7-6 direct-store interface store access example...................................................... 7-9 8-1 data bus write only transaction........................................................................ 8-2 b-1 powerpc 601 processor clocking .......................................................................b-1 b-2 powerpc 603 and powerpc 604 processor clock generation............................b-2 c-1 powerpc 603 to powerpc 604 processor upgrade option.................................c-2 d-1 l2 cache controller organization...................................................................... d-1 tables xiii tables table number title page number i acronyms and abbreviated terms..................................................................... xxi 1-1 60x signal groupings ......................................................................................... 1-5 1-2 use and reference for bus signals..................................................................... 1-5 2-1 transfer encoding for powerpc 601, 603, 604 processors................................ 2-9 2-2 data transfer size............................................................................................. 2-11 2-3 transfer code signal encoding for powerpc 601 processor........................... 2-12 2-4 transfer code signal encoding for the powerpc 603 processor..................... 2-12 2-5 transfer code signal encoding for powerpc 604 processor........................... 2-13 2-6 data bus lane assignments ............................................................................. 2-23 2-7 dp[0?] signal assignments............................................................................ 2-23 2-8 processor bus signal differences..................................................................... 2-34 3-1 number of bus arbitration signals .................................................................... 3-4 3-2 processor read burst ordering......................................................................... 3-10 3-3 aligned data transfers for 64-bit data bus .................................................... 3-11 3-4 aligned data transfers for 32-bit data bus .................................................... 3-12 3-5 misaligned data transfers for the powerpc 601 processor............................. 3-13 3-6 misaligned data transfers for powerpc 603/ 604 processors......................... 3-14 3-7 misaligned data transfers for 603 in 32-bit mode.......................................... 3-16 4-1 mesi state definitions ....................................................................................... 4-6 4-2 cse[0?] signals.............................................................................................. 4-11 4-3 memory coherency actions on load operations ............................................ 4-12 4-4 memory coherency actions on store operations ............................................ 4-12 4-5 powerpc 601 and 604 processor bus operations initiated by cache control instructions ..................................................................................... 4-13 4-6 powerpc 603 bus operations initiated by cache control instructions ........... 4-13 4-7 differences in implementation of bus operations ........................................... 4-20 5-1 resets, interrupts, and their sources ................................................................. 5-1 5-2 processor bus signal differences....................................................................... 5-2 5-3 hard reset settings............................................................................................. 5-3 5-4 powerpc 604e processor modes configurable during hreset ....................... 5-5 5-5 system reset exception?egister settings ...................................................... 5-6 5-6 machine check exception?egister settings................................................... 5-9 5-7 hid0?heckstop sources and enables register (601) .................................. 5-11 5-8 machine check enable bits.............................................................................. 5-13 5-9 external interrupt?egister settings............................................................... 5-15 5-10 system management interrupt?egister settings........................................... 5-16 xiv powerpc microprocessor family: the bus interface for 32-bit microprocessors tables table number title page number 7-1 address bits for packet 0.................................................................................... 7-3 7-2 address bits for i/o reply operations............................................................... 7-5 7-3 direct-store bus operations ............................................................................... 7-6 7-4 extended address transfer code definitions .................................................... 7-6 8-1 ieee interface signal descriptions .................................................................... 8-5 8-2 transfer type settings for lwarx/stwcx. address-only operation ................. 8-10 a-1 bus and memory coherency behavior summary ............................................. a-1 d-1 operations required for processor bus operations .......................................... d-5 e-1 guide to abbreviations .......................................................................................e-1 e-2 coherency actions?oad operations ...............................................................e-2 e-3 coherency actions?tore operations...............................................................e-5 e-4 coherency actions?warx operations.........................................................e-8 e-5 coherency actions?twcx operations .......................................................e-11 e-6 coherency actions?cbt operations...........................................................e-17 e-7 coherency actions?cbtst operations......................................................e-20 e-8 coherency actions?cbz operations...........................................................e-22 e-9 coherency actions?cbst operations ........................................................e-23 e-10 coherency actions?cbf operations...........................................................e-27 e-11 coherency action?cbi operations .............................................................e-31 e-12 coherency actions?cbi operations .............................................................e-34 e-13 coherency actions?ync operations...........................................................e-36 e-14 coherency actions?ieio operations...........................................................e-37 e-15 coherency actions?lbie operations..........................................................e-37 e-16 coherency actions?lbsync operations ...................................................e-37 e-17 coherency actions?noop-kill operations ...................................................e-38 e-18 coherency actions?noop-read operations .................................................e-39 e-19 coherency actions?noop-read atomic operations ....................................e-40 e-20 coherency actions?noop-rwitm operations............................................e-41 e-21 coherency actions?noop-rwitm atomic operations...............................e-41 e-22 coherency actions?noop-flush operations.................................................e-42 e-23 coherency actions?noop-clean...................................................................e-42 e-24 coherency actions?noop-write-with-flush operations..............................e-43 e-25 coherency actions?noop-write-with-kill operations.................................e-44 e-26 coherency actions?noop-write-with-flush-atomic operations.................e-45 e-27 coherency actions?noop-tlb-invalidate operations.................................e-46 e-28 coherency actions?noop-sync operations ...............................................e-46 e-29 coherency actions?noop-eieio operations ...............................................e-46 e-30 coherency actions?noop-tlbsync operations .......................................e-47 e-31 coherency actions?noop-icbi operations..................................................e-47 e-32 coherency actions?noop-rwnitc operations..........................................e-48 about this document xv about this document the primary objective of this document is to provide a detailed functional description of the 60x bus interface, as implemented on the powerpc 601 , powerpc 603 , and powerpc 604 family of powerpc microprocessors. this document is intended to help system and chip set developers by providing a centralized reference source to identify the bus interface presented by the 60x family of powerpc microprocessors. this document should be used in conjunction with the individual microprocessors user s manuals, hardware speci?ations, and the powerpc microprocessor family: the programming environments (referred to as the programming environments manual ). the 60x bus is the communication channel for the ?st generation of powerpc microprocessors. this bus description documents the current operations and system implementation information for the following powerpc processors: powerpc 601 processor. the 601 is the ?st powerpc processor and is designed for desktop, server, and workstation implementations and is designed to support implementation in multiprocessing systems. powerpc 603 processors. references to the 603 include the powerpc 603e processors unless otherwise speci?d. the 603 family of processors is optimized for implementation in low-power systems, and includes bus support for power management, but provides less support for multiprocessing than either the 601 or 604 families of processors. powerpc 604 processors. references to the 604 include the powerpc 604e processors unless speci?d otherwise. the 604 family of processors is designed for implementation in desktop, workstation, and server systems and provides extensive support both for multiprocessing and for power management. although this book can be used as a general guide for the powerpc 602 processor, and for some other 32-bit powerpc processors, it does not include descriptions of operations unique to that processor. all of these processors support 32-bit addressing, and provide separate address and data buses. all provide 64-bit data buses, and some allow the option of con?uring the data bus to work in an optional 32-bit mode. xvi powerpc microprocessor family: the bus interface for 32-bit microprocessors the 60x bus allows processors to access or otherwise communicate with other resources that may share the bus, including system memory, secondary caches, i/o devices, bus arbiters, and other devices. by and large, the 60x bus implementation is consistent among the 601, 603, and 604; however, because the powerpc architecture supports a broad range of system implementations, each processor offers unique features. primary goals of this book are to provide the reader with an understanding of the operations of the basic signals that are common to and required by all 60x processors as well as a familiarity with those signals that are not common to all parts or required for basic operation that can maximize the performance of a system implementation. to aid in this understanding, this document focuses on the following bus relationships among current 60x microprocessors: general bus characteristics common bus characteristics differences between current implementations this document speci?ally describes the communication signals and protocols used by the 601, 603, and 604, and does not describe the power, test, and clock signals. for that information, refer to the particular 60x microprocessor user s manual. in this document, the terms ?01? ?03? ?03e ?04? ?04e? and ?0x bus are used as abbreviations for ?owerpc 601 microprocessor? ?owerpc 603 microprocessor? powerpc 603e microprocessor? ?owerpc 604 microprocessor? powerpc 604e microprocessor? and ?owerpc 60x microprocessor bus interface? respectively. the terms ?rocessor bus interface and ?nterface are analogous with the 60x bus. to locate any published errata or updates for this document, refer to the world-wide web at http://www.mot.com/powerpc/ or at http://www.chips.ibm.com/products/ppc. audience this document is intended for system and processor hardware developers who are developing products that incorporate or interface with the 60x microprocessors. it can also bene? software developers who work with products that use these microprocessors. organization following is a summary and brief description of the major sections of this manual: chapter 1, ?verview,?is useful for readers wanting a general understanding of the features and functions of the powerpc processor interface. it de?es various operational subsets of these features and functions. chapter 2, ?ignal descriptions,?describes each processor input and output signal and gives timing considerations. about this document xvii chapter 3, ?emory access protocol,?describes the operation of the processor interface for memory operations. chapter 4, ?emory coherency,?describes bus features and protocols for maintaining coherency in uniprocessor and multiprocessor systems. chapter 5, ?ystem status signals,?describes the operation of the interrupt, checkstop, and reset signals. it also includes a brief overview of the asynchronous exceptions, with particular attention given to the differences in how the 60x processors implement those exceptions. chapter 6, additional bus con?urations,?describes some alternate modes available for the bus. chapter 7, ?irect-store interface, ?describes the optional direct-store interface for synchronous i/o. chapter 8, ?ystem considerations,?gives useful information for designing systems that use the processor bus. appendix a, ?rocessor summary,?summarizes the processor objectives and a table comparing processor behavior. appendix b, ?rocessor clocking overview,?describes the clocking for the 601, 603, and 604. appendix c, ?rocessor upgrade suggestions,?describes considerations for systems designed to allow a processor upgrade. appendix d, ?2 considerations for the powerpc 604 processor,?gives useful information for those implementing an l2 cache on a system with a 604. appendix e, ?oherency action tables,?provides a comprehensive table of coherency actions that are generated in response to various bus operations in different contexts such as wim bit settings, cache state, and bus states. suggested reading this section lists additional reading that provides background for the information in this manual as well as general information about the powerpc architecture. general information the following documentation provides useful information about the powerpc architecture and computer architecture in general: the following books are available from the morgan-kaufmann publishers, 340 pine street, sixth floor, san francisco, ca 94104; tel. (800) 745-7323 (u.s.a.), (415) 392-2665 (international); internet address: mkp@mkp.com. the powerpc architecture: a speci?ation for a new family of risc processors , second edition, by international business machines, inc. updates to the architecture speci?ation are accessible via the world-wide web at http://www.austin.ibm.com/tech/ppc-chg.html. xviii powerpc microprocessor family: the bus interface for 32-bit microprocessors powerpc microprocessor common hardware reference platform: a system architecture , by apple computer, inc., international business machines, inc., and motorola, inc. macintosh technology in the common hardware reference platform , by apple computer, inc. computer architecture: a quantitative approach , second edition, by john l. hennessy and david a. patterson inside macintosh: powerpc system software, addison-wesley publishing company, one jacob way, reading, ma, 01867; tel. (800) 282-2732 (u.s.a.), (800) 637-0029 (canada), (716) 871-6555 (international) powerpc programming for intel programmers, by kip mcclanahan; idg books worldwide, inc., 919 east hillsdale boulevard, suite 400, foster city, ca, 94404; tel. (800) 434-3422 (u.s.a.), (415) 655-3022 (international) powerpc documentation the powerpc documentation is organized in the following types of documents: user s manuals?hese books provide details about individual powerpc implementations and are intended to be used in conjunction with the programming environments manual. these include the following: powerpc 601 risc microprocessor user s manual : mpc601um/ad (motorola order #) and 52g7484/(mpr601umu-02) (ibm order #) powerpc 602 risc microprocessor user s manual : mpc602um/ad (motorola order #) and mpr602um-01 (ibm order #) powerpc 603e risc microprocessor user s manual with supplement for powerpc 603 microprocessor : mpc603eum/ad (motorola order #) and mpr603eum-01 (ibm order #) powerpc 604 risc microprocessor user s manual : mpc604um/ad (motorola order #) and mpr604umu-01 (ibm order #) programming environments manuals?hese books provide information about resources de?ed by the powerpc architecture that are common to powerpc processors. there are two versions, one that describes the functionality of the combined 32- and 64-bit architecture models and one that describes only the 32-bit model. powerpc microprocessor family: the programming environments , rev 1: mpcfpe/ad (motorola order #) and g522-0290-00 (ibm order #) powerpc microprocessor family: the programming environments for 32-bit microprocessors , rev. 1 , mpcfpe32b/ad (motorola order #) implementation variances relative to rev. 1 of the programming environments manual is available via the world-wide web at http://www.mot.com/powerpc/ or at http://www.chips.ibm.com/products/ppc. about this document xix addenda/errata to user s manuals?ecause some processors have follow-on parts an addendum is provided that describes the additional features and changes to functionality of the follow-on part. these addenda are intended for use with the corresponding user s manuals. these include the following: addendum to powerpc 603e risc microprocessor user s manual: powerpc 603e microprocessor supplement and user s manual errata: mpc603eumad/ad (motorola order #) and sa14-2034-00 (ibm order #) addendum to powerpc 604 risc microprocessor user s manual : powerpc 604e microprocessor supplement and user s manual errata : mpc604umad/ad (motorola order #) and sa14-2056-01 (ibm order #) hardware speci?ations?ardware speci?ations provide speci? data regarding bus timing, signal behavior, and ac, dc, and thermal characteristics, as well as other design considerations for each powerpc implementation. these include the following: powerpc 601 risc microprocessor hardware speci?ations : mpc601ec/d (motorola order #) and mpr601hsu-03 (ibm order #) powerpc 602 risc microprocessor hardware speci?ations : mpc602ec/d (motorola order #) and sc229897-00 (ibm order #) powerpc 603 risc microprocessor hardware speci?ations : mpc603ec/d (motorola order #) and g522-0289-00 (ibm order #) powerpc 603e risc microprocessor family: pid6-603e hardware speci?ations : mpc603eec/d (motorola order #) and g522-0268-00 (ibm order #) powerpc 603e risc microprocessor family: pid7v-603e hardware speci?ations : mpc603e7vec/d (motorola order #) and g522-0267-00 (ibm order #) powerpc 604 risc microprocessor hardware speci?ations : mpc604ec/d (motorola order #) and mpr604hsu-02 (ibm order #) powerpc 604e risc microprocessor family: pid9v-604e hardware speci?ations : mpc604e9vec/d (motorola order #) and sa14-2054-00 (ibm order #) technical summaries?ach powerpc implementation has a technical summary that provides an overview of its features. this document is roughly the equivalent to the overview (chapter 1) of an implementation s user s manual. technical summaries are available for the 601, 602, 603, 603e, 604, and 604e as well as the following: powerpc 620 risc microprocessor technical summary : mpc620/d (motorola order #) and sa14-2069-01 (ibm order #) xx powerpc microprocessor family: the bus interface for 32-bit microprocessors powerpc microprocessor family: the programmer s reference guide is a concise reference that includes the register summary, memory control model, exception vectors, and the powerpc instruction set. mpcprg/d (motorola order #) and mprppcprg-01 (ibm order #) powerpc microprocessor family: the programmer s pocket reference guide : this foldout card provides an overview of the powerpc registers, instructions, and exceptions for 32-bit implementations. mpcprgref/d (motorola order #) and sa14-2093-00 (ibm order #) application notes?hese short documents contain useful information about speci? design issues useful to programmers and engineers working with powerpc processors. documentation for support chips?hese include the following: mpc105 pci bridge/memory controller user s manual : mpc105um/ad (motorola order #) mpc106 pci bridge/memory controller user s manual : mpc106um/ad (motorola order #) additional literature on powerpc implementations is being released as new processors become available. for a current list of powerpc documentation, refer to the world-wide web at http://www.mot.com/powerpc/ or at http://www.chips.ibm.com/products/ppc. conventions this document uses the following notational conventions: active_high names for signals that are active high are shown in uppercase text without an overbar. active_low a bar over a signal name indicates that the signal is active low?or example, ar tr y (address retry) and ts (transfer start). active-low signals are referred to as asserted (active) when they are low and negated when they are high. signals that are not active low, such as ap[0?] (address bus parity signals) and tt[0?] (transfer type signals) are referred to as asserted when they are high and negated when they are low. sys- (or sys- ) this pre? is used to distinguish signals coming from the system bus from one on the 60x processor that otherwise have the same name. mnemonics instruction mnemonics are shown in lowercase bold. operations address-only bus operations that are named for the instructions that generate them are identi?d in uppercase letters, for example, icbi, sync, tlbsync, and eieio operations. italics italics indicate variable command parameters, for example, bcctr x 0x0 pre? to denote hexadecimal number 0b0 pre? to denote binary number about this document xxi r a, r b instruction syntax used to identify a source gpr r a|0 the contents of a speci?d gpr or the value 0 r d instruction syntax used to identify a destination gpr fr a, fr b, fr c instruction syntax used to identify a source fpr fr d instruction syntax used to identify a destination fpr reg[field] abbreviations or acronyms for registers are shown in uppercase text. speci? bits, ?lds, or ranges appear in brackets. for example, msr[le] refers to the little-endian mode enable bit in the machine state register. x in certain contexts, such as a signal encoding, this indicates a don t care. n used to express an unde?ed numerical value. acronyms and abbreviations table i contains acronyms and abbreviations that are used in this document. table i. acronyms and abbreviated terms term meaning alu arithmetic logic unit asr address space register bat block address translation bist built-in self test biu bus interface unit buid bus unit id cop common on-chip processor cr condition register ctr count register dabr data address breakpoint register dar data address register dbat data bat dec decrementer (register) dsisr register used for determining the source of a dsi exception dtlb data translation look-aside buffer ea effective address ear external access register ecc error checking and correction xxii powerpc microprocessor family: the bus interface for 32-bit microprocessors etp extended transfer protocol ex exclusive state (includes shared, s, and exclusive unmodi?d, e) fifo first-in, ?st-out fpr floating-point register fpscr floating-point status and control register fpu floating-point unit gpr general-purpose register hid n hardware implementation-dependent register iabr instruction address breakpoint register ibat instruction bat ieee institute of electrical and electronics engineers itlb instruction translation look-aside buffer jtag joint test action group l2 secondary cache lr link register lrs lwarx reservation set lru least recently used lsb least-signi?ant byte lsb least-signi?ant bit mesi modi?d/exclusive/shared/invalid?ache coherency protocol mmcr n monitor mode control register n mmu memory management unit msb most-signi?ant byte msb most-signi?ant bit msr machine state register nan not a number no-op no operation oea operating environment architecture pid processor identi?ation tag pll phase-locked loop pmc n performance monitor control (register) n pmi performance monitor interrupt table i. acronyms and abbreviated terms (continued) term meaning about this document xxiii pte page table entry pteg page table entry group pvr processor version register rda read atomic risc reduced instruction set computing/computer rtl register transfer language rwitm read with intent to modify rwitma read with intent to modify atomic sbr single-beat read sbra single-beat read atomic sbw single-beat write sdr1 register that speci?s the page table base address for virtual-to-physical address translation slb segment lookaside buffer spr special-purpose register sprg n registers available for general purposes sr segment register srr0 (machine status) save/restore register 0 srr1 (machine status) save/restore register 1 tap test access port controller tb time base register tlb translation lookaside buffer uisa user instruction set architecture vea virtual environment architecture wwf write with ?sh wwfa write with ?sh atomic wwk write with kill xatc extended address transfer code xer register used for indicating conditions such as carries and over?ws for integer operations table i. acronyms and abbreviated terms (continued) term meaning xxiv powerpc microprocessor family: the bus interface for 32-bit microprocessors chapter 1. overview 1-1 chapter 1 overview 10 10 this chapter gives an overview of the bus interface common to the 60x microprocessors. it describes the operation and features of this interface, lists all the microprocessor signals, shows differences between the three microprocessors in the use and number of signals, and de?es various operational subsets of signals, and in particular identi?s those that are required for any system. this bus description documents the current operations and system implementation information for the following powerpc processors: powerpc 601 processor. the 601 is the ?st powerpc processor and is designed for desktop, server, and workstation implementations and is designed to support implementation in multiprocessing systems. powerpc 603 processors. references to the 603 include the powerpc 603e processors unless otherwise speci?d. the 603 family for processors is optimized for implementation in low-power systems, and includes bus support for power management, but provides less support for multiprocessing than either the 601 or 604 families of processors. powerpc 604 processors. references to the 604 include the powerpc 604e processors unless otherwise speci?d. the 604 family of processors is designed for implementation in desktop, workstation, and server systems and provides extensive support both for multiprocessing and for power management. although this book can be used as a general guide for the powerpc 602 processor, it does not include descriptions of the speci? operations that are unique to that processor. 1.1 powerpc 60x microprocessor interface the 601, 603, and 604 support a range of systems, including low-power and notebook machines, low-cost desktop personal computers, high-performance workstations, and multiprocessor server systems. to meet those needs, the interface to these processors was de?ed with a minimum set of functions and 32-bit or 64-bit data bus modes, as well as optional performance and function enhancement signals and modes. 1-2 powerpc microprocessor family: the bus interface for 32-bit microprocessors the 60x bus de?ition is based on the motorola 88110 bus de?ition. this interface runs synchronous to the system clock. inputs are sampled at and outputs are driven from the rising edge of the system clock. this processor bus provides two transfer protocols: the basic transfer protocol is used to access normal memory segments. this protocol supports transfer of any number of 32- or 64-bit continuous bytes within an aligned double word to any address in the 32-bit address range. it also supports the use of burst transfers and multiple-beat transfers that transfer up to 64 bits of data during each beat. direct-store operations (elsewhere referred to as extended transfer protocol, or etp) use a slightly different protocol for accessing the direct-store segments as de?ed in the powerpc architecture. this protocol provides an extended address, support for split transactions, and a positive reply for each transaction. the synchronous nature of this protocol limits its performance compared to the basic protocol, but provides for enhanced error recovery. this functionality is now considered optional to the powerpc architecture and is not supported in all powerpc processors, for example in second generation processors in the 603 family. the powerpc architecture includes the following: an address space shared by all processing elements in the system a weakly-ordered memory model that allows processors to improve performance by reordering loads and stores a set of explicit cache management and translation lookaside buffer (tlb) management instructions that can be broadcast by the processor to allow software control of caches in a single- or multiple-processor environment instructions for synchronizing operations between different processors each processor has a separate address and data bus. in the basic transfer protocol, these separate buses may be used to implement coupled address and data tenures typical of low- end personal computers, or they may be used to implement advanced features such as address pipelining, which allows a new bus transaction to begin before the current transaction has ?ished, and split-bus transactions, which allows the address bus and data bus to have separate masters at the same time. the processor bus supports full write-back cache coherency, bus snooping, transaction retry, and snoop copy-back operations, although it should be noted that each processor may not implement all such features and that some processors may implement such features in a more sophisticated manner. the bus de?es signals that support access from multiple masters, including other processors and devices, with arbitration provided by the system implementation. chapter 1. overview 1-3 1.2 powerpc system block diagram figure 1-1 shows the processor bus in a typical system design. the bus provides a communications layer between one or more powerpc processors, the memory controller, the system-provided arbiter, a bridge to an expansion bus for system i/o, and optionally a high-speed i/o adaptor such as a graphics adaptor. the processor component may have an external cache. this bus supports cache implementations that are in-line or lookaside and that are write-through or write-back. figure 1-1. typical system diagram with processor bus 1.3 processor bus features the processor bus provides high performance and adaptability to various system environments. features of this bus include the following: bus operation greater than 66 mhz with 601, 603, and 604 maintenance of coherency for external cache support for split transactions support for pipelined bus transactions support for address-only bus transactions used primarily for cache control support for multiprocessor con?urations optional performance enhancements note that not all processors in the 60x family may support all available features. memory system i/o processor and cache processor and cache memory controller graphics adaptor system arbiter bridge to expansion bus processor bus 1-4 powerpc microprocessor family: the bus interface for 32-bit microprocessors 1.4 bus interface signals figure 1-2 shows the powerpc processor view of all signals. signals that are part of the basic set are shown as solid lines. those that are optional and provide enhanced functions or performance are shown as dashed lines. figure 1-2. processor bus signals address bus arbitration br bg abb ts xa ts ape ap[0?] tt[0?] tbst tsiz[0?] tc n ci wt gbl cse n aa ck ar tr y shd rsr v dbg dbw o dbb dh[0?1], dl[0?1] dp[0?] dpe dbdis t a dr tr y tea int smi mcp ckstp_in ckstp_out hreset sreset a[0?1] vcc tlbisync l2_int tben (processor speci?) (processor speci?) address transfer start address transfer address transfer attribute address transfer termination data bus arbitration data transfer data transfer termination system status processor state power management hp_snp_req chapter 1. overview 1-5 the signal groupings in figure 1-2 are described in table 1-1. the evolution of the processors and the target market for the processors dictated that some of these signals are not supported on some processors, have different pin counts, or may operate differently on some processors. those differences are described in section 2.12, ?ummary of signal differences. table 1-2 briefly describes each signal function and provides a reference to the detailed description of the signal state meanings and timing considerations in chapter 2, ?ignal descriptions.? table 1-1. 60x signal groupings signal group functionality address bus arbitration used to arbitrate for the address bus address transfer start indicate that the bus master has begun a transaction on the address bus address transfer used to transfer the address and to ensure the integrity of the transfer address transfer attribute provide information about the type of transfer address transfer termination indicate the end of the address phase or the need to repeat the address phase data bus arbitration used to arbitrate for the data bus mastership data transfer used to transfer the data and ensure the integrity of the transfer data transfer termination indicate the end of a data transfer or that the data phase should be repeated system status indicate interrupts and system resets processor state used to manage the processor state power management provide a means for a processor and system to cooperate in power management operations. the speci? signals for each processor are identi?d in table 1-2. table 1-2. use and reference for bus signals signal i o function application section basic l2 mp opt. address bus arbitration signals bus request (br ) ? requests mastership of the bus ? 2.1.1 bus grant (b g ) ? indicates bus ownership if properly quali?d ? 2.1.2 address bus busy (abb ) ?? indicates whether the address bus is busy ? 2.1.3 2.1.4 address transfer start signals transfer start (ts ) ?? indicates that the master has begun a transaction to memory ? 2.2.1 2.2.2 extended transfer start (xa ts ) ?? indicates that the master has begun a transaction to a direct-store address ? 2.2.3 2.2.4 1-6 powerpc microprocessor family: the bus interface for 32-bit microprocessors address transfer signals address bus (a[0?1]) ?? indicates the real address of the bus transaction ? 2.3.1 2.3.4 address parity (ap[0?]) ?? gives odd parity for each address byte ? 2.3.5 2.3.6 address parity error (ape ) ? indicates detection of address bus parity error ? 2.3.7 address transfer attribute signals transfer type (tt[0?]) ?? indicates the type of transfer in progress ? 2.4.1 2.4.2 transfer burst (tbst ) ?? indicates that a burst transfer is in progress ? 2.4.3 2.4.4 transfer size (tsiz[0?]) ?? indicates the size in bytes of transfer in progress ? 2.4.5 2.4.6 transfer code (tc n ) ? gives information about the transaction for external cache operations ? 2.4.7 cache inhibit (ci ) ? indicates whether a transfer can be cached ? 2.4.8 write-through (wt ) ? indicates whether a transaction is write-through ? 2.4.9 global (gbl ) ?? indicates that a transaction is global and that data coherence is required ? 2.4.10 2.4.11 cache set element (cse n ]) ? represents the cache replacement set element of the current transaction ? 2.4.12 high-priority snoop request (hp_snp_req ) ? 601 only: used to indicate when the reserved position in the write queue is needed for a push operation resulting from a snoop hit ? 2.4.13 address transfer termination signals address acknowledgment (aa ck ) ? indicates that the address portion of a transaction is complete ? 2.5.1 address retry (ar tr y ) ?? asserted when the address tenure must be retried ? 2.5.2 2.5.3 shared (shd ) ?? as an output, indicates the master hit a shared cache block. as an input, indicates the incoming cache block should be marked shared (s) ? 2.5.4 2.5.5 data bus arbitration signals data bus grant (d bg ) ? indicates the master may, with proper quali?ation, assume ownership of the data bus ? 2.6.1 data bus write only (dbw o ) ? indicates an outstanding write may precede a pipeline read ? 2.6.2 table 1-2. use and reference for bus signals (continued) signal i o function application section basic l2 mp opt. chapter 1. overview 1-7 data bus busy (dbb ) ?? indicates the data bus is busy ? 2.6.4 data transfer signals data bus (dh[0?1];dl[0?1]) ?? represents the data being transferred ? 2.7.1 2.7.2 data bus parity (dp[0?]) ?? represents odd parity for the data bytes ? 2.7.3 2.7.4 data parity error (dpe ) ? forces processor to put data bus in high- impedance state during a write data tenure; other processor operations are unaffected. ? 2.7.5 data bus disable (dbdis ) ? indicates to the processor that a write transaction should be stopped ? 2.7.6 data transfer termination signals transfer acknowledge (t a ) ? indicates that a single-beat data transfer completed successfully ? 2.8.1 data retry (dr tr y ) ? invalidates read data sent to processor with t a in the previous cycle. on hard reset, is used to con?ure some alternate modes. ? 2.8.2 transfer error acknowledgment (tea ) ? indicates that a bus error occurred ? 2.8.3 system status signals interrupt (int ) ? indicates an external interrupt to the processor ? 2.9.1 system management interrupt (smi ) ? indicates a system management interrupt to the processor ? 2.9.2 machine check (mcp ) ? indicates a machine check exception ? 2.9.3 checkstop input (ckstp_in ) ? indicates the processor must stop operation (checkstop) ? 2.9.4 checkstop output (ckstp_out ) ? indicates the processor has detected a checkstop condition ? 2.9.5 hard reset (hreset ) ? initiates a hard reset exception ? 2.9.6 soft reset (sreset ) ? initiates a soft reset exception ? 2.9.7 processor state signals reservation (rsr v ) ? indicates that a reservation generated by a lwarx instruction exists in the processor ? 2.10.1 external cache intervention (l2_int) ? indicates intervention from other bus masters ? 2.10.2 time base enable (tben) ? indicates the time base should continue clocking ? 2.10.3 table 1-2. use and reference for bus signals (continued) signal i o function application section basic l2 mp opt. 1-8 powerpc microprocessor family: the bus interface for 32-bit microprocessors the four columns under the heading, ?pplication in table 1-2 are described as follows: basic operations?ignals in the column labeled ?asic in table 1-2 are required to build a simple, uniprocessor system with one bus, no external cache, and no support for bus pipelining. within this set of signals, tt4 is optional and, as shown in table 3-1, is used to identify additional transactions that can be snooped. l2 cache support?ignals in the ?2 column in table 1-2 are required to support an external cache. for example, tc[0?] are necessary to indicate the type of transaction. however, some of these signals are optional for some system designs. for instance, a write-through external cache would not need the wt signal, or a cache that responds only to burst operations would not need the ci signal. multiprocessor support?he signals gbl , shd , and bg , listed in the ?p column, support memory coherency for systems with masters other than the processor including multiprocessor systems. chapter 4, ?emory coherency,? provides detailed information on memory coherency. the bg signal would be used to assign the bus in systems in which the bus is shared by multiple devices, in which case the gbl signal would be interconnected between all devices to ensure cache coherency. optionally the gbl and shd signals could be connected to a bridge for snooping. the bridge would set it to a known state. tlbi synchronization (tlbisync ) ? 603: indicates execution should stop after a tlbsync instruction ? 2.10.4 power management signals quiescent request (quiesc_req) ? 601: indicates the 601 is ready to enter a soft stop state ? 2.11.1 system quiesced (sys_q uiesc ) ? 601: indicates to the 601 that the system is ready for the soft stop state ? 2.11.2 resume (resume) ? 601: indicates to resume normal processing ? 2.11.3 quiescent request (qreq ) ? 603: requests all bus activity requiring snooping to pause ? 2.11.4 quiescent acknowledge (qa ck ) ? 603: indicates all bus activity that requires snooping has paused ? 2.11.5 halted (halted) ? 604: indicates the 604 has entered a low-power state ? 2.11.6 run (run) ? 604: indicates to keep snooping in low-power state ? 2.11.7 table 1-2. use and reference for bus signals (continued) signal i o function application section basic l2 mp opt. chapter 1. overview 1-9 enhanced operation?he signals in the ?ptional column (labeled ?pt.? provide additional functions and performance enhancements, including the following: address bus arbitration?he abb signal is optional because it can be derived from other signals if all masters refrain from taking the address bus from the beginning of ts through the end of aa ck . parity signals?ddress and data parity signals, ap[0?], ape , dp[0?], and dpe , are optional. low-end and low-cost systems, for example some personal computers, may not check or generate parity on either addresses or data. higher- cost systems may generate parity for only data. high-end systems may generate parity for both addresses and data and may use the processor-generated indications of parity errors to control other system components. address pipeline?eparate data bus arbitration and granting signals allow independent operation of address and data bus tenures. the dbw o signal allows the processor to run a data bus tenure for an outstanding write address even if a read address is pipelined before it. data retry?he dr tr y signal is used to support speculative forwarding of data. interrupts?ome processors have an additional system management interrupt in addition to the hardware interrupt de?ed by the powerpc architecture. this interrupt is signaled by asserting the s mi signal. soft reset?he soft reset signal, sreset , is used to initiate a soft reset, a type of system reset that is not de?ed by the powerpc architecture but implemented on most powerpc processors. power management?ome processors support the use of the quiesc_req, sys_q uies c , resume, qreq , qa ck , run, and halted signals to control power consumption allowing power to be removed from certain portions of the processor when not in use. 603 address translation?ecause the 603 is optimized for low-power, uniprocessor systems, hardware support is not provided for table search operations. extended transfer start?he xa ts signal supports the direct-store accesses. chapter 7, ?irect-store interface,?describes the direct-store interface and the effect this protocol has on tt[0?], tbst , tsiz[0?], and a[0?1]. 1-10 powerpc microprocessor family: the bus interface for 32-bit microprocessors chapter 2. signal descriptions 2-1 chapter 2 signal descriptions 20 20 this chapter describes the external signals used by the powerpc 601, powerpc 603, and powerpc 604 processors, identifying both the set of signals that are common to all 60x processors as well as indicating characteristics of individual processor implementations. it contains a concise description of individual signals, showing behavior when the signal is asserted and negated and when the signal is an input and an output. note that the descriptions in this chapter are intended to provide a quick summary of signal functions. subsequent chapters describe the operation of many of these signals in greater detail, both with respect to how individual signals function and how groups of signals interact. note a bar over a signal name indicates that the signal is active low?or example, ar tr y (address retry) and ts (transfer start). active-low signals are referred to as asserted (active) when they are low and negated when they are high. signals that are not active low, such as ap[0?] (address bus parity signals) and tt[0?] (transfer type signals) are referred to as asserted when they are high and negated when they are low. the clock, power, and test signals are not described in this document. refer to the user s manual for the particular processor for this information. the bus signal descriptions in this chapter are grouped by the categories shown in figure 1-2. the section names in this chapter correspond to those groups as de?ed in section 1.4, ?us interface signals.?the sections describe state and timing descriptions for each signal and indicate if a signal is an input or an output with respect to a powerpc processor. if a signal is both, the output characteristics are described ?st. the description is from the perspective of the processor; no attempt is made to describe these signals as an arbiter, slave, or target would see them. the differences between how signals are implemented on different processors is summarized in section 2.12, ?ummary of signal differences. 2-2 powerpc microprocessor family: the bus interface for 32-bit microprocessors 2.1 address bus arbitration signals to access the address bus, a device must request and gain bus mastership. bus arbitration signals are a collection of input and output signals bus devices use to request the address bus, recognize when the request is granted, and indicate to other devices when mastership is granted. for detailed descriptions and timing diagrams that show how these signals interact, see section 3.2.1, address bus arbitration. 2.1.1 bus request (br )?utput following are state and timing descriptions for the bus request (br ) as an output signal. state meaning asserted? device is requesting address bus mastership. br can be asserted for one or more cycles and then deasserted due to an internal cancellation of the bus request (for example, due to the loss of a memory reservation). negated?o device is requesting the address bus. the device may have no bus operation pending, it may be parked, or the ar tr y input was asserted on the previous bus clock cycle. timing comments assertion? bus transaction is needed and the device does not have a quali?d bus grant. this may occur even if the maximum (two for the 601and 603, three for the 604) possible pipeline accesses have occurred. for the 603, br is asserted for one cycle during execution of a dcbz or of a load instruction that hits in the touch load buffer. negation?ccurs for at least one bus clock cycle after an accepted, quali?d bus grant (see bg and abb ), even if another transaction is pending. it is also negated for at least one cycle after the assertion of ar tr y , unless that processor caused the assertion of ar tr y to perform a cache block push for that snoop operation. 2.1.2 bus grant (bg )?nput following are state and timing descriptions for the bus grant (bg ) as an input signal. state meaning asserted?he device may, with the proper quali?ation, assume mastership of the address bus. a quali?d bus grant occurs in a given cycle when the following conditions are met: ?g is asserted. no address cycle is in progress (as marked by abb or the ts - through-aa ck interval). ?r tr y is negated and was negated on the previous cycle (not considered on 601). the assertion of br is not required for the quali?d bus grant (for example, the parked case). chapter 2. signal descriptions 2-3 note that the 601 recognizes a quali?d bus grant on the cycle after aa ck even if ar tr y is asserted as long as the 601 is asserting ar tr y and has exclusive ownership of the data associated with the snoop that caused the ar tr y . the abb and ar tr y signals are driven by the bus master. if the processor is parked, br need not be asserted for the quali?d bus grant. negated?he device is not the next potential address bus master. timing comments assertion?ay occur at any time to indicate the device is free to use the address bus. after the processor gains bus mastership, it does not check for a quali?d bus grant again until the cycle in which the address bus tenure completes (assuming it has another transaction to run). the processor does not accept a bg in the cycles between the assertion of any ts or xa ts through to the assertion of aa ck . negation?ay occur at any time to indicate the device cannot use the bus. however, the device still assumes mastership on the bus clock cycle bg is negated because, in the previous cycle, bg indicated to the device that it could take mastership (if quali?d). 2.1.3 address bus busy (abb )?utput following are state and timing descriptions for address bus busy (abb ) as an output signal. state meaning asserted?he device is the address bus master. negated?he device is not using the address bus. if abb is negated in the bus clock cycle after a quali?d bus grant, the device did not accept mastership, even if br was asserted. this can occur if a potential transaction is aborted internally before it started. timing comments assertion?ccurs on the bus clock cycle after a quali?d bus grant that is accepted by the device (see negated). negation?ccurs for a fraction of the bus clock cycle after aa ck is asserted. if abb is negated in the bus clock cycle after a quali?d bg , the device did not accept mastership, even if br was asserted. high impedance?ccurs during a fractional portion of the bus cycle in which abb is negated. abb is guaranteed by design to be high impedance by the end of the cycle in which it is negated. for speci? information, see the particular processor s user s manual. 2-4 powerpc microprocessor family: the bus interface for 32-bit microprocessors 2.1.4 address bus busy (abb )?nput following are state and timing descriptions for abb as an input signal. state meaning asserted?he address bus is being used by another master, which effectively keeps the device from assuming address bus ownership, regardless of the bg input. the processor will not take the address bus for the sequence of cycles beginning with ts and ending with aa ck , which effectively makes abb optional if other bus masters respond in the same way as the processor. negated?he address bus is not owned by another bus device and is available when accompanied by a quali?d bus grant. timing comments assertion?ay occur when the other devices must be prevented from using the address bus (and the processor is not currently asserting abb ). negation?ay occur whenever the master can use the address bus. 2.2 address transfer start signals address transfer start signals are input and output signals that indicate that an address bus transfer has begun. the transfer start (ts ) signal identi?s the operation as a memory transaction; extended address transfer start (xa ts ) identi?s the transaction as a direct- store operation. for detailed information about how ts and xa ts interact with other signals, refer to section 3.2.2, address transfer,?and chapter 7, ?irect-store interface, respectively. 2.2.1 transfer start (ts )?utput following are state and timing descriptions for transfer start (ts ) as an output signal. state meaning asserted?he master has begun a memory bus transaction and the address bus and transfer attribute signals are valid. when asserted with the appropriate tt[0?] signals, it is also an implied data bus request for a memory transaction (unless ts output is an address- only operation). negated?as no special meaning. however, ts is negated throughout an entire direct-store address tenure. timing comments assertion?oincides with the assertion of abb . negation?ccurs one bus clock cycle after ts is asserted. high impedance?601 and 603) occurs one bus clock cycle after ts is negated, which is coincident with the negation of abb . high impedance?604) occurs one bus clock cycle after the negation of ts . for the 604, the ts negation is only one bus cycle long, regardless of the ts -to-aa ck delay. chapter 2. signal descriptions 2-5 2.2.2 transfer start (ts )?nput following are state and timing descriptions for ts as an input signal. state meaning asserted?nother master began a bus transaction and the address bus and transfer attribute signals are valid for snooping (see gbl ). negated?o bus transaction is occurring. timing comments assertion?ay occur any time outside the address tenure window: either the interval that includes the cycle of a previous ts assertion through the cycle after aa ck or the cycles in which abb is asserted for a previous address tenure, whichever is greater. negation?ust occur one bus clock cycle after ts is asserted. 2.2.3 extended address transfer start (xa ts )?utput (direct-store) following are state and timing descriptions for extended address transfer start (xa ts ) as an output signal. state meaning asserted?he master began a direct-store operation and the ?st address cycle is valid. when asserted with the appropriate extended address transfer code (xatc) signals, it is also an implied data bus request for certain direct-store operations (unless it is an address- only operation). negated?as no special meaning; however, xa ts remains negated throughout an entire memory address tenure. timing comments assertion?oincides with the assertion of abb . negation?ccurs one bus clock cycle after the assertion of xa ts . high impedance?601 and 603) occurs one bus clock cycle after the negation of xa ts , which coincides with the negation of abb . high impedance?604) occurs one bus clock cycle after the negation of xa ts . for the 604, xa ts negation is only one bus cycle long, regardless of the xa ts -to-aa ck delay. 2.2.4 extended address transfer start (xa ts )?nput (direct-store) following are state and timing descriptions for xa ts as an input signal. state meaning asserted?he master must check for a direct-store operation reply. negated?here is no need to check for a direct-store reply. timing comments assertion?ay occur at any time outside of the cycles that de?e the window of an address tenure. this window is marked by either the interval that includes the cycle of a previous xa ts assertion through the cycle after aa ck or by the cycles in which abb is asserted for a previous address tenure, whichever is greater. negation?ust occur one bus clock cycle after xa ts is asserted. 2-6 powerpc microprocessor family: the bus interface for 32-bit microprocessors 2.3 address transfer signals the address transfer signals are used to transmit the address and to generate and monitor parity for the address transfer. for detailed descriptions of how these signals interact, see section 3.2.2, address transfer. 2.3.1 address bus (a[0?1])?utput (memory operations) following are state and timing descriptions for the address bus (a[0?1]) as output signals during memory operations. state meaning asserted/negated?epresents the physical address of the data to be transferred. on burst transfers, the address bus presents the double- word?ligned address (quad-word?ligned for the 601) with the critical data that missed the cache on a read operation, or the ?st double word of the cache clock on a write operation. note that the address output during burst operations is not incremented. timing comments assertion/negation?ccurs on the bus clock cycle after a quali?d bus grant (coincides with assertion of abb and ts ). high impedance?ccurs one bus clock cycle after aa ck is asserted. 2.3.2 address bus (a[0?1])?nput (memory operations) following are state and timing descriptions for a[0?1] as input signals for memory operations. state meaning asserted/negated?arries the address of a snoop operation. timing comments assertion/negation?ust occur on the same bus clock cycle as the assertion of ts ; is sampled by the processor only on this cycle. 2.3.3 address bus (a[0?1])?utput (direct-store operations) following are state and timing descriptions for a[0?31] as output signals for direct-store operations. state meaning asserted/negated?or direct-store operations from this device, the address tenure consists of two packets (each requiring a bus cycle). for packet 0, these signals convey control and tag information. for packet 1, they represent the physical address of the data to be transferred. for reply operations to other devices, the address bus carries control, status, and tag information. timing comments assertion/negation?n address tenure consists of two beats. the ?st occurs on the bus clock cycle after a quali?d bus grant, coinciding with xa ts . the address bus makes a transition to the second beat on the next bus clock cycle. high impedance?ccurs the bus clock cycle after aa ck is asserted. chapter 2. signal descriptions 2-7 2.3.4 address bus (a[0?1])?nput (direct-store operations) following are state and timing descriptions for a[0?1] as input signals for direct-store operations. state meaning asserted/negated?hen the processor receiving a[0?1] signals is not the master, it snoops (and checks address parity) on only the ?st address beat of all direct-store operations for i/o reply operations whose receiver tags match the processor identi?ation (pid) tag. see section 7.1, ?irect-store transaction protocol details. timing comments assertion/negation?he ?st beat of the i/o transfer address tenure coincides with xa ts , with the second address beat on the next cycle. 2.3.5 address bus parity (ap[0?])?utput following are state and timing descriptions for the address bus parity signals (ap[0?]) as output signals. state meaning asserted/negated?epresents one bit of odd parity for each of four address bus bytes. odd parity means an odd number of bits, including the parity bit, are driven high. signal assignments are as follows: ap0 a[0?] ap1 a[8?5] ap2 a[16?3] ap3 a[24?1] for more information, see section 3.2.2.1, address bus parity. timing comments assertion/negation/high impedance?he same as a[0?1]. 2.3.6 address bus parity (ap[0?])?nput following are state and timing descriptions for ap[0?] as input signals. state meaning asserted/negated?epresents one bit of odd parity for each of four address bus bytes for snooping and direct-store operations. depending on msr[me] and various hid0 bits, detecting even parity either causes the processor to enter the checkstop state or take a machine check exception. if address parity check is enabled in hid0, detection of even parity unconditionally causes a checkstop in the 601. (see the ape signal description.) timing comments assertion/negation?he same as a[0?1]. 2-8 powerpc microprocessor family: the bus interface for 32-bit microprocessors 2.3.7 address parity error (ape )?utput following are state and timing descriptions for the address parity error (ape ) output signal. note that ape is an open-drain type output and requires an external pull-up resistor to assure proper deassertion. state meaning asserted?he processor detected incorrect address bus parity on a snoop for a transaction type it recognizes and can respond to, such as the ?st address beat of a direct-store operation. the 603 does not assert ape if address parity checking is disabled. negated?he processor did not detect even address bus parity. timing comments assertion?ccurs the second bus clock cycle after ts or xa ts is asserted. high impedance?ccurs the third bus clock cycle after ts or xa ts is asserted. 2.4 address transfer attribute signals the transfer attribute signals further characterize the transfer?ndicating such things as the transfer size, whether it is a read or write, and whether it is a burst or single-beat transfer. for a detailed description of how these signals interact, see section 3.2.2, address transfer.?some signals that function one way for memory operations may work differently for direct-store accesses; see chapter 7, ?irect-store interface. 2.4.1 transfer type (tt[0?])?utput following are state and timing descriptions for the transfer type signals (tt[0?]) as output signals. state meaning asserted/negated?able 2-1 de?es the transactions identi?d by the tt[0?] signals. the table gives the type of transaction the type of data transferred, the source or cause of the transfer, and the processors that support these transaction types as master or when snooping. some codes in this table are reserved. notice that the encoding has been chosen to simplify decoding. for example, tt1 is generally zero for writes and one for reads or tt3 is generally zero for an address-only operation. for a full description of coherency actions, see appendix e, ?oherency action tables.? for direct-store operations, these signals are part of the extended address transfer code (xatc) along with tsiz n and tbst : xatc(0?) = tt(0?)||tbst ||tsiz(0?). tt4 is driven negated as an output on the 601. timing comments assertion/negation/high impedance?he same as a[0?1]. chapter 2. signal descriptions 2-9 2.4.2 transfer type (tt[0?])?nput following are state and timing descriptions for tt[0?] as input signals. state meaning asserted/negated?able 2-1 de?es the transactions identi?d by tt[0?]. for a full description of coherency actions, see appendix e, ?oherency action tables. for direct-store operations, tt[0?] form part of the xatc and are snooped if xa ts is asserted. timing comments assertion/negation?he same as a[0?1]. table 2-1. transfer encoding for powerpc 601, 603, 604 processors tt [0?] bus master transactions processor support transaction transfer source initiator snooper 00000 clean block address only dcbst 601, 604 601, 604 00100 flush block address only dcbf 601, 604 601, 604 01000 sync address only sync 601, 604 601, 604 01100 kill block address only store hit on shared block or dcbz , dcbi , or a 601 icbi 601/603/604 601/603/604 10000 ordered i/o operation address only eieio 604 10100 external control word write single-beat write ecowx 601/603/604 11000 tlb invalidate address only tlbie 601/604 601/604 11100 external control word read single-beat read eciwx 601/603/604 00001 lwarx reservation set address only lwarx cache hit at execution 604 00101 reserved 01001 tlb synchronize address only tlbsync 604 604 01101 invalidate instruction cache copy address only icbi 604 604 00010 write-with-?sh single-beat write or burst caching-inhibited or write- through store 601/603/604 601/603/604 00110 write-with-kill burst snoop writeback, dcbf , dcbst , or castout hit modi?d data 601/603/604 601/603/604 01010 read single-beat read or burst cacheable load miss (601/604), cacheable instruction miss or cache-inhibited load 601/603/604 601/603/604 01110 read-with-intent-to-modify burst load miss (603) or store miss 601/603/604 601/603/604 10010 write-with-?sh-atomic single-beat write stwcx. 601/603/604 601/603/604 10110 reserved 11010 read-atomic single-beat read or burst lwarx 601/603/604 601/603/604 2-10 powerpc microprocessor family: the bus interface for 32-bit microprocessors 2.4.3 transfer burst ( tbst )?utput following are state and timing descriptions for transfer burst (tbst ) as an output signal. state meaning asserted? burst transfer is in progress. negated? burst transfer is not in progress. also, part of extended address transfer code (xatc); see section 2.4.1, ?ransfer type (tt[0?])?utput. for external control instructions ( eciwx / ecowx ), tbst outputs ear[28], which is part of the resource id (tbst ||tsiz[0?]). timing comments assertion/negation/high impedance?he same as a[0?1]. 2.4.4 transfer burst ( tbst )?nput following are state and timing descriptions for tbst as an input signal. state meaning asserted?or direct-store operations, tbst forms part of the xatc; see section 2.4.2, ?ransfer type (tt[0?])?nput. negated? burst transfer is not in progress. timing comments assertion/negation/high impedance?he same as a[0?1]. 2.4.5 transfer size (tsiz[0?])?utput following are state and timing descriptions for the transfer size signals tsiz[0?] as output signals. state meaning asserted/negated?or memory accesses, these signals with tbst indicate the data transfer size for the current bus operation, as shown in table 2-2. this table shows transfer sizes indicated by combinations of tbst and tsiz[0?]. note that one combination is de?ed for system use. this combination could be generated by systems but would not be output from a powerpc processor. 11110 read-with-intent-to- modify-atomic burst stwcx. miss with valid reservation 601/603/604 601/603/604 00x11 reserved 01011 read-with-no-intent-to- cache single-beat read or burst snooped only 603/604 01111 reserved 1xxx1 reserved for customer table 2-1. transfer encoding for powerpc 601, 603, 604 processors (continued) tt [0?] bus master transactions processor support transaction transfer source initiator snooper chapter 2. signal descriptions 2-11 for direct-store operations, these signals form part of the extended address transfer code (xatc); see the description in section 2.4.1, ?ransfer type (tt[0?])?utput. the external control instructions, eciwx / ecowx , use these signals to output ear[29?1], to form the resource id (tbst ||tsiz[0?]). timing comments assertion/negation/high impedance?he same as a[0?1]. 2.4.6 transfer size (tsiz[0?])?nput following are state and timing descriptions for tsiz[0?] as input signals. state meaning asserted/negated for direct-store operations, tsiz[0?] are part of the xatc; see section 2.4.2, ?ransfer type (tt[0?])?nput. timing comments assertion/negation?he same as a[0?1]. 2.4.7 transfer code (tc n )?utput the transfer code (tc n ) consists of three output signals on the 604 (tc[0?]) and two output signals for the 601 and 603 (tc[0?]). these signals provide information about the current transaction that may be useful for implementing external caches. following are state and timing descriptions for tc n . state meaning asserted/negated?epresents a special encoding for the transfer in progress and gives supplemental information for certain transaction types. see table 2-3, table 2-4, and table 2-5. timing comments assertion/negation/high impedance?he same as a[0?1]. table 2-2. data transfer size tbst tsiz[0?] transfer size asserted 0 0 0 reserved asserted 0 0 1 burst (16 bytes) reserved for system use asserted 0 1 0 burst (32 bytes) asserted 0 1 1 reserved (64-byte bursts) asserted 1 x x reserved negated 0 0 0 8 bytes negated 0 0 1 1 byte negated 0 1 0 2 bytes negated 0 1 1 3 bytes negated 1 0 0 4 bytes negated 1 0 1 5 bytes negated 1 1 0 6 bytes negated 1 1 1 7 bytes 2-12 powerpc microprocessor family: the bus interface for 32-bit microprocessors table 2-3 shows the transfer code de?itions for the 601. table 2-4 shows the transfer code meanings for the 603. table 2-3. transfer code signal encoding for powerpc 601 processor signal state de?ition tc0 asserted read: bus operation is an instruction fetch. write: operation is invalidating the cache line in the 601. kill block (address only): operation is invalidating the cache block in the 601. deasserted read: bus operation is not an instruction fetch. write: operation is not invalidating the cache line in the 601. kill block (address only): operation is not invalidating the cache block in the 601. tc1 asserted the next access is likely to be on the same page; a sector has been loaded and a low-priority load of the adjacent sector is queued. deasserted the next access isn? likely to be on the next page; no load to the adjacent sector is queued. table 2-4. transfer code signal encoding for the powerpc 603 processor tc[0?] read write 0 0 data transaction any write 0 1 touch load 1 0 instruction fetch 1 1 reserved chapter 2. signal descriptions 2-13 table 2-5 shows transfer code options for the 604 and gives the transaction type, the encoding of the wt and tc[0?] signals, and the type of cycle. table 2-5. transfer code signal encoding for powerpc 604 processor transfer type wt 1 tc [0?] br asserted 2, 3 from write- back buffer ts after ar tr y d snoop 4 final cache state 5 comments write with kill 1 100 never always don? care i cache copy-back 0 xx0 no yes yes m, e, s, i to distinguish between cache copy-back, block clean ( dcbst ), or block ?sh ( dcbf ), this transaction must be ar tr y d. this transaction eventually returns (before anything but another snoop push directly from the data cache) indicating another wt /tc code combination. 100 no yes no i block ?sh ( dcbf ) 000 no yes no m, e, i block clean ( dcbst )?he dcbst instruction changes the cache state to e when the modi?d block is put in the copy-back buffer. before the low- priority write-back buffer entry completes its address tenure, the cache state can be changed to m by a store or to i by a dcbi or a cache miss. 010 yes no don? care s, i snoop push 6 directly from data cache (read or read- atomic)?he read or read-atomic snoop changes the data cache state to s when the modi?d block is placed in the snoop-push buffer. before the buffer completes its address tenure, the cache can be changed to i by a dcbi or cache miss. 2-14 powerpc microprocessor family: the bus interface for 32-bit microprocessors write with kill 0 010 yes yes don? care s or i snoop push 6 from write-back buffer (read or read- atomic)?he data cache has a shared copy if the buffer held a block clean ( dcbst ) transaction. if it held a block ?sh ( dcbf ) or cache write-back transaction, the cache has no valid copy after the transaction. to know if the processor kept a shared copy or invalidated this block, this transaction must be ar tr y d. if it originated from the write-back buffers and no new snoops occur, the transaction returns as the next ts and indicates a dcbf, dcbst, or write-back wt /tc code. if it returns as a snoop push read, it came from the data cache. 100 yes no don? care i snoop push 6 directly from data cache (rwitm, rwitm-atomic, ?sh, write w/?sh, write w/?sh- atomic, or kill) 100 yes yes don? care i snoop push 6 from write-back buffers (rwitm, rwitm-atomic, ?sh, write w/?sh-atomic, write w/?sh, write w/kill, or kill) 000 yes no don? care m, e, i snoop push 6 from data cache (clean or rwnitc) the clean or rwnitc snoop changes the data cache state to e when the modi?d block is put in the snoop-push buffer. before the buffer completes its address tenure, the cache state can be changed to m by a store or to i by either a dcbi instruction or cache miss. 000 yes yes don? care m, e, i ( dcbst in buffer) i (cache write- back or dcbf in buffer) snoop push 6 from write-back buffers (clean or rwnitc)?f this snoop hit on a block-?sh ( dcbf ) or a cache write-back in the write-back buffers, the cache does not have a valid copy of this address after this transaction. if this snoop hits a block-store ( dcbst ) in the write-back buffers, the processor can keep an exclusive copy of the cache block. kill block x 100 never no don? care i kill block deallocate ( dcbi ) 1 000 m kill block and allocate no castout required ( dcbz ) 1 001 kill block and allocate castout required ( dcbz ) 1 000 kill block; write to block marked s table 2-5. transfer code signal encoding for powerpc 604 processor (continued) transfer type wt 1 tc [0?] br asserted 2, 3 from write- back buffer ts after ar tr y d snoop 4 final cache state 5 comments chapter 2. signal descriptions 2-15 2.4.8 cache inhibit (ci )?utput following are state and timing descriptions for the cache inhibit (ci ) output signal. state meaning asserted?enerally indicates that a single-beat transfer will not be cached, re?cting the setting of the i bit for the block or page that contains the address of the current transaction. negated?enerally indicates that a burst transfer will allocate a data cache block. set negated for castouts and pushes. section 4.8, ?xternal wim bit settings,?describes exceptions to the above. timing comments assertion/negation/high impedance?he same as a[0?1]. read 7 w 8 0x0 never no don? care e, s data read, no castout required?he cache state is s if shd was asserted to the processor for a read or read-atomic transaction. if shd was not asserted or if the transaction was an rwitm or rwitm-atomic transaction, the cache state is e. w 0x1 e, s data read, castout required?he cache state is s if shd was asserted to the processor for a read or read-atomic transaction. if shd was not asserted, or if the transaction was an rwitm or rwitm- atomic transaction, the cache state is e. w 1x0 valid instruction read icbi x 100 never no don? care invalid kill block deallocate ( icbi 9 ) notes : 1 the value in the wt column re?cts the logic value seen on the signal. 2 the window for the assertion of br is de?ed as the second cycle after aa ck if ar tr y were asserted the cycle after aa ck . 3 the full condition for this column is ?he br corresponding to this transaction was asserted in the window for the last snoop to this address. 4 the full condition for this column is ?his transaction is the ?st ts asserted by this processor after one or more ar tr y d snoop transactions and the address of this transaction matches the address of at least one of those ar tr y d snoop transactions. 5 this column re?cts the ?al mesi state in the processor of the line referenced by this transaction after the transaction completes successfully without ar tr y . 6 this snoop push is guaranteed to push the most-recently modi?d data in the processor. no more snoop operations are required to ensure that this snoop has been fully processed by the processor. 7 read in this case encompasses all of read or rwitm, normal or atomic. 8 w = write-through bit from translation 9 icbi is distinguished from kill block by assertion of tt4. table 2-5. transfer code signal encoding for powerpc 604 processor (continued) transfer type wt 1 tc [0?] br asserted 2, 3 from write- back buffer ts after ar tr y d snoop 4 final cache state 5 comments 2-16 powerpc microprocessor family: the bus interface for 32-bit microprocessors 2.4.9 write-through (wt )?utput following are state and timing descriptions for the write-through (wt ) output signal. state meaning asserted?enerally indicates that a single-beat transaction is write- through, re?cting the value of the w bit for the block or page that contains the address of the current transaction. negated?enerally indicates a transaction is not write-through. section 4.8, ?xternal wim bit settings,?describes exceptions to the above. timing comments assertion/negation/high impedance?he same as a[0?1]. 2.4.10 global (gbl )?utput following are state and timing descriptions for global signal (gbl ) as an output signal. for the 604e, hid0[23] lets software control the behavior of gbl for instruction fetches through the address-translation mechanism; refer to 604e user documentation. state meaning asserted?enerally indicates that a transaction is global, re?cting the setting of the m bit for the block or page that contains the address of the current transaction (except in the case of write-back operations, which are nonglobal.) negated?enerally indicates that a transaction is not global. for 603 and 604, this signal is negated on instruction fetches. for exceptions, see section 4.8, ?xternal wim bit settings.? timing comments assertion/negation/high impedance?he same as a[0?1]. 2.4.11 global (gbl )?nput following are state and timing descriptions for gbl as an input signal. state meaning asserted? transaction can be snooped; however, the processor will not snoop reserved transaction types, bus operations associated with the eieio , eciwx , or ecowx instructions, or address-only bus transactions associated with an lwarx reservation set. note that the 603 snoops the reservation address register for global and nonglobal address transfers. this snooping is required for the 603 s implementation of the lwarx and stwcx. instructions, which require snoops on castouts and snoop pushes (nonglobal). snoops with gbl = 1 do not affect the cache state. negated? transaction is not snooped by the processor. timing comments assertion/negation?he same as a[0?1]. chapter 2. signal descriptions 2-17 2.4.12 cache set element (cse n )?utput the number of cache set element signals on each processor depends upon the cache associativity of that processor. there are three cache set element signals on the 601 (cse[0?]), one on the 603 (cse), and two on the 603e, 604, and 604e (cse[0?]). following are state and timing descriptions for the cse n signals. in some documentation these signals are called cache set entry or cache set enable signals. state meaning asserted/negated?epresents the cache replacement set element (also referred to as the way or coherency class) for the current cache transaction. can be used with the address bus and the transfer attribute signals to externally track the state of each cache block in the processor. the cse n signals are not meaningful during data cache touch load operations on a 603. timing comments assertion/negation/high impedance?he same as a[0?1]. 2.4.13 high-priority snoop request ( hp_snp_req )?01 only following are state and timing descriptions for the high-priority snoop request input signal (hp_snp_req ) on the 601. this signal is enabled by setting hid0[31]. state meaning asserted?he 601 may add an additional reserved queue position to the list of available queue positions for push transactions that are a result of a snoop hit. negated?he 601 will not make the reserved queue available for a snoop hit push resulting from a transaction. this is the normal mode. timing comments assertion/negation?ust be valid throughout the address tenure. 2.5 address transfer termination signals the address transfer termination signals indicate either that the address tenure has completed successfully or must be repeated, and when it should be terminated. section 3.2.3, address transfer termination,?escribes how these signals interact. 2.5.1 address acknowledge (aa ck )?nput following are state and timing descriptions for the address acknowledge (aa ck ) as an input signal. state meaning asserted?he address phase of a transaction is complete. the address bus goes to high-impedance state on the next bus clock cycle. the processor samples ar tr y on the bus clock cycle after assertion of aa ck . the 604 can sample ar tr y by the second cycle after ts is asserted. negated?uring assertion of abb , indicates the address bus and transfer attribute signals must remain driven. 2-18 powerpc microprocessor family: the bus interface for 32-bit microprocessors timing comments assertion?an occur as soon as the bus clock cycle after ts or xa ts is asserted, but can be delayed to extend address access time, for example, to support slow snooping devices. negation?ust occur one bus clock cycle after assertion of aa ck . 2.5.2 address retry (ar tr y )?utput following are state and timing descriptions for address retry (ar tr y ) as an output signal. state meaning asserted?he master detects a condition in which a snooped address tenure must be retried. if the processor must update memory as a result of the snoop that caused the retry, the processor asserts br during that snoop window, which is de?ed as the second cycle after aa ck if ar tr y was asserted the cycle after aa ck . also invalidates data in some cases; see section 3.3.1.1, ?ffect of artry assertion on data transfer and arbitration on the powerpc 604 processor. high impedance?he master does not need the snooped address tenure to be retried. timing comments assertion?sserted the second bus cycle after the assertion of ts if a retry is required. thus, when a retry is required, there is only one empty cycle between the assertions of ts and ar tr y . negation?ccurs the second bus cycle after the assertion of aa ck . because ar tr y can be simultaneously driven by multiple devices, it is driven negated in the following ways: 601?ccurs the second bus cycle after the assertion of aa ck . since ar tr y may be simultaneously driven by multiple devices, it negates in a unique fashion. first the buffer goes to high impedance for one bus cycle, then it is driven high for one 2xpclk cycle before returning to high impedance. this method of negation may be disabled by setting hid0[29]. 603?ccurs the second bus cycle after the assertion of aa ck . since ar tr y may be simultaneously driven by multiple devices, it negates in a unique fashion. first the buffer goes to high impedance for a minimum of one-half processor cycle (dependent on the clock mode), then it is driven negated for one bus cycle before returning to high impedance. this method of negation can be disabled by setting hid0[7]. 604?r tr y becomes high impedance for at least one-half bus cycle, then is driven high for approximately one bus cycle. ar tr y is then guaranteed by design to become high impedance at the latest by the start of third cycle after aa ck . this method of negation can be disabled by setting hid0[7]. chapter 2. signal descriptions 2-19 2.5.3 address retry (ar tr y )?nput following are state and timing descriptions for ar tr y as an input signal. state meaning asserted?or the address bus master, ar tr y indicates the device must retry the preceding address tenure and immediately negate br (if asserted). if the associated data tenure has begun, the 603 and 604 also abort the data tenure immediately even if burst data has been received. for devices that are not the address bus master, this input indicates they should immediately negate br for one bus clock cycle after the assertion of ar tr y by the snooping bus master to allow a write-back operation. negated/high impedance?he master need not retry the last address tenure. timing comments assertion?ay occur as soon as the second cycle after ts or xa ts is asserted; must occur by the bus clock cycle immediately after the assertion of aa ck if an address retry is required. negation?ust occur in the second cycle after aa ck is asserted. 2.5.4 shared (shd )?utput following are state and timing descriptions for the shared (shd ) as an output signal. state meaning asserted?f ar tr y is negated, indicates that after this transaction completes successfully, the master will keep a valid shared copy of the address or that a reservation exists on this address. if shd and ar tr y are asserted for a snooping master, the snoop hit modi?d data that will be pushed as the master s next address transaction. negated/high impedance?fter this address is transferred, the processor will not have a valid copy of the snooped address. timing comments assertion/negation?ame as ar tr y . high impedance?ame as ar tr y . because it does not support the shared mesi state (s), the 603 does not implement shd . 2.5.5 shared (shd )?nput following are state and timing descriptions for shd as an input signal. state meaning asserted?f ar tr y is not asserted, the master must allocate the incoming cache block as shared (s) for a self-generated transaction. applies only to read and read atomic transactions. negated?f ar tr y is negated, the master can allocate the incoming cache block as exclusive (e) for a self-generated read or read-atomic transaction. timing comments assertion/negation?he same as ar tr y . because it does not support the shared (s) mesi state, the 603 does not implement shd . 2-20 powerpc microprocessor family: the bus interface for 32-bit microprocessors 2.6 data bus arbitration signals like address bus arbitration signals, data bus arbitration signals maintain an orderly process for determining data bus mastership. note that there is no equivalent to the address bus arbitration signal br (bus request), because, except for address-only transactions, ts and xa ts imply data bus requests. for a detailed description on how these signals interact, see section 3.3.1, ?ata bus arbitration. the dbw o signal lets the processor be con?ured dynamically to write data out of order with respect to read data. 2.6.1 data bus grant (dbg )?nput following are state and timing descriptions for the data bus grant (dbg ) as an input signal. state meaning asserted?ith proper quali?ation a device can become data bus master. note that in some cases, assertion of ar tr y invalidates the data bus grant (see section 3.3.1.1, ?ffect of artry assertion on data transfer and arbitration on the powerpc 604 processor?. the device achieves a quali?d data bus grant when the following conditions are met: the data bus is not bus busy (dbb is negated). (this condition does not apply to the 604 (or 604e) in data streaming mode.) ?r tr y is negated. (this condition does not apply for a processor using data streaming or no-dr tr y mode.) ?r tr y is negated if ar tr y applies to the associated address tenure. negated?he master must hold off its data tenures. timing comments assertion?ay occur any time to indicate that the device is free to assume data bus mastership. the processor can sample it as early as the cycle that ts or xa ts is asserted. for the 604 in data streaming mode, dbg must be asserted for exactly one cycle per data bus tenure, the cycle before the data tenure is to begin. the system cannot assert dbg earlier or park dbg , or assert it for consecutive cycles. the dbb signal does not participate in determining a quali?d data bus grant. therefore, the system must assert dbg in a way that prevents data tenure collisions from different masters. also, the system must assert dbg so data tenures complete before providing another dbg . if a dbg is given early to the 604 in data streaming mode, the processor drops the current data tenure prematurely in the next cycle and begins any pending data tenure. chapter 2. signal descriptions 2-21 the 604e has less restrictive timing requirements in data streaming mode?bg must be asserted no earlier than the cycle before 604e's data tenure is to begin only when another master currently owns the data bus (that is, when dbb would normally be asserted for a data tenure). if no other masters own the data bus (asserting dbb ), the 604e allows the system to park dbg . dbb is still an output-only signal in data streaming mode (that is, dbb does not participate in a quali?d data bus grant), requiring the system to use dbg to ensure that different masters don't collide on data tenures. if the system tries to stream back-to-back data tenures by asserting dbg with the ?al t a of the ?st data tenure, the processor accepts the dbg as a quali?d data bus grant only if the current and next data tenures are both burst reads. other combinations cannot be streamed. negation?ay occur at any time to indicate that the master cannot assume control of the data bus. 2.6.2 data bus write only (dbw o )?nput following are state and timing descriptions for dbw o as an input signal. state meaning asserted?he processor can run the data bus tenure for an outstanding write address even if a read address is pipelined before the write address. if write data is not available, the processor performs the ?st pending read transfer. see section 3.3.2, ?ata bus write only,?for detailed instructions for using dbw o . note that the 601 takes the bus only for a pending data bus write operation and not for a read operation. negated?he processor runs address and data tenures in the same order. tying dbw o negated preserved address/data ordering. timing comments assertion?ust occur no later than a quali?d dbg for a pending write tenure. the dbw o signal is recognized by the processor only on the clock cycles of a quali?d data bus grant. negation?ay occur any time after a quali?d data bus grant and before the next quali?d data bus grant. 2.6.3 data bus busy (dbb )?utput following are state and timing descriptions for data bus busy (dbb ) as an output signal. state meaning asserted?he device is the data bus master. the processor always assumes data bus mastership if it needs the data bus and is given a quali?d data bus grant (see dbg ). negated?he device is not using the data bus, unless the data tenure is being extended by the assertion of dr tr y . note that for the 604e in no-dr tr y mode, dr tr y is tied asserted and is ignored. 2-22 powerpc microprocessor family: the bus interface for 32-bit microprocessors timing comments assertion?ccurs in the bus clock cycle after a quali?d dbg . negation?ccurs for a fractional bus clock cycle after the assertion of the ?al t a or within two cycles of the assertion of tea . high impedance?ccurs during a fractional portion of the bus cycle in which dbb is negated. the dbb signal is designed to be high impedance by the end of the cycle in which it is negated. for speci? information, see the appropriate user s manual. 2.6.4 data bus busy (dbb )?nput following are state and timing descriptions for dbb as an input signal. in data streaming mode, dbb is only an output and is not part of a quali?d data bus grant; see chapter 6, additional bus con?urations. state meaning asserted?nother device is data bus master. note that dbb cannot be used in systems that use read data streaming. negated?he device is not using the data bus. if the arbiter is designed to assert dbg exactly one cycle before the next data tenure starts, dbb is unnecessary and may be pulled high. timing comments assertion?ust occur when the processor must be kept from using the data bus. negation?ay occur whenever the data bus is available. 2.7 data transfer signals like the address transfer signals, the data transfer signals are used to transmit data and to generate and monitor parity for the data transfer. for a detailed description of how data transfer signals interact, see section 3.3.3, ?ata transfer. 2.7.1 data bus (dh[0?1], dl[0?1])?utput following are state and timing descriptions for the dh and dl as output signals. state meaning asserted/negated represents the state of data during a data write. the data bus has two halves?ata bus high (dh) and data bus low (dl). table 2-6 shows data bus lane assignments. direct-store operations use dh exclusively (there are no 64-bit, direct-store operations). unselected byte lanes do not supply valid data. timing comments assertion/negation?nitial beat coincides with dbb and, for bursts, transitions on the bus clock cycle after each assertion of t a .the data bus is driven once for noncached transactions and four times for processor cache transactions (bursts). high impedance?ccurs on the bus clock cycle after the ?al assertion of t a . chapter 2. signal descriptions 2-23 2.7.2 data bus (dh[0?1], dl[0?1])?nput following are state and timing descriptions for dh and dl as input signals. state meaning asserted/negated?epresents the state of data during a data read transaction. the data bus has two halves, data bus high (dh) and data bus low (dl). table 2-6 shows byte lanes. direct-store operations use dh exclusively (there are no 64-bit direct-store operations). timing comments assertion/negation?ata must be valid on the same bus clock cycle that t a is asserted.the data bus is driven once for noncached transactions and four times for processor cache transactions (bursts). 2.7.3 data bus parity (dp[0?])?utput following are state and timing descriptions for the data bus parity (dp[0?]) output signals. state meaning asserted/negated?epresents odd parity for each of the eight bytes of a data write transaction. odd parity means that an odd number of bits, including the parity bit, are driven high. table 2-7 shows signal assignments. all eight bits are driven with valid parity on all bus write operations except direct-store operations for which only dp[0?] are driven with valid parity. timing comments assertion/negation?he same as dl[0?1]. high impedance?he same as dl[0?1]. table 2-6. data bus lane assignments data bus signals byte lane dh[0?] 0 dh[8?5] 1 dh[16?3] 2 dh[24?1] 3 dl[0?] 4 dl[8?5] 5 dl[16?3] 6 dl[24?1] 7 table 2-7. dp[0?] signal assignments signal name signal assignments dp0 dh[0?] dp1 dh[8?5] dp2 dh[16?3] dp3 dh[24?1] 2-24 powerpc microprocessor family: the bus interface for 32-bit microprocessors 2.7.4 data bus parity (dp[0?])?nput following are state and timing descriptions for dp[0?] as input signals. state meaning asserted/negated?epresents one bit of odd parity for each byte of read data. parity is checked on all data byte lanes during data read operations, regardless of the size of the transfer. during direct-store read operations, only the dp[0?] signals (corresponding to byte lanes dh[0?1]) are checked for odd parity. if data parity errors are enabled, detected even parity causes a checkstop or a machine check exception (and assertion of dpe ) depending on the state of msr[me]. for the 601, if data parity check is enabled in hid0, detection of even parity unconditionally causes a checkstop. timing comments assertion/negation?he same as dl0?l31. 2.7.5 data parity error (dpe )?utput following are state and timing descriptions for the data parity error (dpe ) output signal. dpe is an open-drain type output and requires a pull-up resistor for proper deassertion. state meaning asserted?he processor detected incorrect data bus parity on incoming read data. negated?ndicates correct data bus parity. timing comments assertion?ccurs on the second bus clock cycle after t a is asserted to the processor and is driven for one cycle. 2.7.6 data bus disable ( dbdis )?nput following are the state meanings and timing comments for the data bus disable (dbdis ) input signal. this signal is not on the 601. state meaning asserted?or a write transaction, the processor must release the data bus and dp[0?] to high impedance in the next cycle. the data tenure remains active, dbb remains driven, and the transfer termination signals are still monitored by the processor. the dbdis signal is ignored for read transactions. negated?he data bus should remain normally driven. dp4 dl[0?] dp5 dl[8?5] dp6 dl[16?3] dp7 dl[24?1] table 2-7. dp[0?] signal assignments (continued) signal name signal assignments chapter 2. signal descriptions 2-25 timing comments assertion/negation?hould be driven one cycle before the data bus can be driven by the processor. may be asserted on any clock cycle when the processor is driving, or will be driving, the data bus and may remain asserted multiple cycles. 2.8 data transfer termination signals data termination signals are required after each data beat in a data transfer. note that in a single-beat transaction, the data termination signals also indicate the end of the tenure, while in burst accesses, the data termination signals apply to individual beats and indicate the end of the tenure only after the ?al data beat. for a detailed description of how these signals interact, see section 3.3.4, ?ata transfer termination. 2.8.1 transfer acknowledge (t a )?nput following are state and timing descriptions for the transfer acknowledge (t a ) input signal. state meaning asserted? single-beat data transfer or a data beat in a burst transfer completed successfully (unless dr tr y is asserted on the next bus clock cycle for reads). the ta signal must be asserted for each data beat in a burst transaction. negated?ntil t a is asserted, the master must continue driving the data for the current write or must wait to sample the data for reads. timing comments assertion?uring a data tenure, which generally begins after a quali?d data bus grant and continues through the period de?ed by dbb or dr tr y . this period is affected by the ar tr y window. see section 3.3.1.1, ?ffect of artry assertion on data transfer and arbitration on the powerpc 604 processor.?the system can withhold asserting t a to indicate that the master should insert wait states to extend a data tenure. negation?ust occur after the bus clock cycle of the ?al (or only) data beat of the transfer. for a burst transfer, the system can assert t a for one bus clock cycle and then negate it to advance the burst transfer to the next beat and insert wait states during the next beat. when the 603 is con?ured for 1:1 clock mode and is performing a burst read into data cache, the 603 requires one wait state between the assertion of ts and the ?st assertion of t a for that transaction. if no-dr tr y mode is also selected, the 603 requires two wait states. 2.8.2 data retry (dr tr y )?nput following are state and timing descriptions for the data retry (dr tr y ) input signal. state meaning asserted?he master must invalidate the data from the previous read operation. dr tr y is ignored for write transactions and is not de?ed for direct-store transfers. 2-26 powerpc microprocessor family: the bus interface for 32-bit microprocessors negated?ata presented with t a on the previous read operation is valid. this is essentially a late t a to allow speculative forwarding of data (with t a ) during reads. timing comments assertion must occur during the bus clock cycle immediately after t a is asserted if a retry is required. the dr tr y signal can be held asserted for multiple bus clock cycles. when it is negated, data must have been valid on the previous clock with t a asserted. negation must occur during the bus clock cycle after a valid data beat. this may occur several cycles after dbb is negated, effectively extending the data bus tenure. start-up?or 603 and 604e, dr tr y is sampled at the negation of hreset ; if dr tr y is asserted, no-dr tr y mode is selected. if dr tr y is negated at start-up, dr tr y is enabled. if no-dr tr y or data streaming mode is selected, dr tr y must be negated during normal operation (after hreset ). the no-dr tr y mode provides a one-cycle faster read and the data streaming eliminates wasted cycles between data bursts. see section 6.1, ?o-drtry mode (603 and 604e),?for a description of no-dr tr y mode or chapter 6, additional bus con?urations,?for a description of data streaming. 2.8.3 transfer error acknowledge (tea ) input following are state and timing descriptions for the transfer error acknowledge (tea ) input signal. state meaning asserted a bus error occurred that causes a machine check exception (or causes the processor to enter the checkstop state if the machine check enable bit is cleared (msr[me] = 0)). for more information, see section 5.3, ?achine check and checkstops.? assertion terminates the current transaction; that is, assertion of t a and dr tr y are ignored. asserting tea causes the negation/high impedance of dbb in the next clock cycle. however, data entering the gpr or the cache are not invalidated. if tea is asserted during a direct-store transaction, the machine check or checkstop action of the tea is delayed and subsequent direct-store transactions continue until all transfers from the direct- store segment complete. the tea signal must be asserted for every direct-store data tenure including the last one. the processor takes a machine check or a checkstop no sooner than the last direct-store data tenure has been terminated by the assertion of tea . a load or store reply is not necessary after the last data tenure receives a tea assertion. negated?o bus error was detected . chapter 2. signal descriptions 2-27 timing comments assertion?ay be asserted while dbb is asserted or during the valid dr tr y window. in data streaming mode, the 604/604e does not recognize tea the cycle after t a during a read operation due to the absence of a dr tr y assertion opportunity. tea should be asserted for one cycle only. negation?ea must be negated no later than the negation of dbb or the last dr tr y . the processor deasserts dbb within one bus clock cycle after the assertion of tea . 2.9 system status signals most system interrupt, checkstop, and reset signals are input signals that indicate when exceptions are received, when checkstop conditions have occurred, and when the processor must be reset. the processor generates ckstp_out when it detects a checkstop condition. for detailed descriptions, see chapter 5, ?ystem status signals. 2.9.1 interrupt (int )?nput following are state and timing descriptions for the interrupt (int ) input signal. state meaning asserted?he processor initiates an external interrupt if msr[ee] is set and int remains asserted long enough; otherwise, the processor ignores the interrupt. negated?ormal operation should proceed. see section 5.4, ?xternal interrupt exception (0x00500). timing comments assertion?ay occur at any time and may be asserted asynchronously to the input clocks. the int input is level-sensitive. negation?hould not occur until exception is taken. for the 601, this signal can be negated after at least three processor clock cycles. 2.9.2 system management interrupt (smi )?nput following are state and timing descriptions for the system management interrupt (smi ) input signal. this interrupt supports power management and is not on the 601. state meaning asserted?he processor initiates a system management interrupt exception if msr[ee] is set. negated?ormal operation should proceed. see section 5.4, ?xternal interrupt exception (0x00500). timing comments assertion?ay occur at any time and may be asserted asynchronously to the input clocks. the smi input is level-sensitive. negation?hould not occur until exception is taken. 2-28 powerpc microprocessor family: the bus interface for 32-bit microprocessors 2.9.3 machine check interrupt (mcp )?nput following are state and timing descriptions for the machine check interrupt (mcp ) input signal. this signal is not on the 601. state meaning asserted?he processor initiates a machine check interrupt operation if msr[me] and hid0[emcp] are set; if msr[me] is cleared and hid0[emcp] is set, the processor must terminate operation by internally gating off all clocks and releasing all outputs (except ckstp_out ) to the high-impedance state. if hid0[emcp] is cleared, the processor ignores the interrupt condition. the mcp signal must remain asserted for two bus clock cycles. negated?ormal operation should proceed. timing comments assertion?ay occur at any time and may be asserted asynchronously to the input clocks. mcp is negative edge-sensitive. negation?ay be negated two bus cycles after assertion. 2.9.4 checkstop input ( ckstp_in )?nput following are state and timing descriptions for the checkstop input signal (ckstp_in ). state meaning asserted?he processor must terminate operation by internally gating off all clocks and releasing all outputs except ckstp_out to high-impedance state. once asserted, ckstp_in must remain asserted until the system has been reset. negated?ormal operation should proceed. see section 5.3, ?achine check and checkstops. timing comments assertion?ay occur at any time and may be asserted asynchronously to the input clocks. for the 601, ckstp_in must be asserted at least three pclk_en clock cycles. or it may be asserted synchronously meeting setup and hold times (speci?d in the hardware speci?ations) and must be asserted for at least two pclk_en clock cycles. negation?ay occur any time after ckstp_out is asserted. 2.9.5 checkstop output ( ckstp_out ) ?utput following are state and timing descriptions for checkstop output (ckstp_out ) as an output signal. note that ckstp_out is an open-drain type output and requires an external pull-up resistor to assure proper deassertion. state meaning asserted?he processor detected a checkstop condition and ceased operation. negated?he processor is operating normally. see section 5.3, ?achine check and checkstops. chapter 2. signal descriptions 2-29 timing comments assertion?an occur at any time asynchronously to input clocks. negation?s negated upon assertion of hreset . 2.9.6 hard reset ( hreset )?nput the hard reset (hreset ) input signal must be used at power-on to properly reset the processor. this input has additional functionality in certain test modes. following are state and timing descriptions for hreset . state meaning asserted?nitiates a hard reset operation when hreset transitions from asserted to negated. causes a reset exception as described in section 5.2.1.1, ?ard reset settings.?output drivers are released to high impedance within ?e clocks (three clocks for the 601) after the assertion of hreset . negated?ormal operation should proceed. timing comments assertion?an occur at any time and can be asynchronous with the processor input clock; must be held asserted for at least 255 (300 for the 601) clock cycles. negation?an occur after the minimum reset pulse width is met. 2.9.7 soft reset ( sreset )?nput the soft reset (sreset ) input signal has additional functionality in certain test modes. following are state and timing descriptions for sreset . state meaning asserted?nitiates processing for a soft reset exception as described in section 5.2.2, ?oft reset. negated?ormal operation should proceed. timing comments assertion?an occur at any time and can be asynchronous with the processor input clock. sreset is negative edge-sensitive. negation?ay occur any time after the minimum soft reset pulse width of two (10 for the 601) bus cycles is met. 2.10 processor state signals the signals described in this section provide inputs for controlling the time base in the processor, external cache access by the processor, and an output signal from the processor to indicate that a memory reservation has been set. 2.10.1 reservation (rsr v )?utput following are state and timing descriptions for the reservation (rsr v ) output signal. state meaning asserted/negated?e?cts the state of the reservation coherency bit used by the lwarx / stwcx. instructions. see section 4.5.1, ?owerpc 603 processor lwarx/stwcx. implementation. 2-30 powerpc microprocessor family: the bus interface for 32-bit microprocessors timing comments assertion--occurs synchronously one bus clock cycle after execution of an lwarx instruction that sets the internal reservation condition. on 604 and 604e, rsr v is asserted as late as the fourth cycle after aa ck for a read-atomic operation if the lwarx instruction requires a read-atomic operation. negation?ccurs synchronously one bus clock cycle after execution of an stwcx. instruction that clears the reservation or as late as the second bus cycle after ts is asserted for a snoop that clears the reservation. 2.10.2 external cache intervention (l2_int)?nput following are state and timing descriptions for the external cache intervention (l2_int) input signal. this signal is not on the 601 or 603. state meaning asserted?he current data transaction required intervention from other bus devices. negated?he current data transaction did not require intervention. timing comments assertion/negation?his signal is sampled by the processor coincident with the ?st assertion of t a for a given data tenure. 2.10.3 time base enable (tben)?nput the time base enable (tben) input signal is essentially a count enable for the time base. following are state and timing descriptions for tben. this signal is not on the 601. state meaning asserted?he time base should continue clocking. negated?he time base should stop counting. timing comments assertion/negation?ay occur on any cycle and is synchronous with the system clock. 2.10.4 tlbi synchronization ( tlbisync )?nput following are state and timing descriptions for the tlbi synchronization (tlbisync ) input signal. this signal is not on the 601 or 604. state meaning asserted?nstruction execution should stop after tlbsync executes. negated?nstruction execution can resume after tlbsync completes. tlbisync is sampled when hreset negates to select 32-bit data bus mode; if tlbisync is negated, 32-bit mode is disabled. see section 6.3, ?2-bit data bus mode (603).? timing comments assertion/negation?ay occur on any cycle. chapter 2. signal descriptions 2-31 2.11 power management signals each processor has input and output signals de?ed to support low-power modes for the processor and system. these signals are not the same between processors. the signals for each processor are described in this section. 2.11.1 quiescent request (quiesc_req)?utput following are state and timing descriptions for the quiescent request (quiesc_req) output signal, which the 601 uses to request the system to enter a soft-stop state. state meaning asserted?he 601 is requesting a soft stop state for the system. negated?he 601 is not requesting a soft stop state. timing comments assertion/negation?ay occur at any time. 2.11.2 system quiesced ( sys_q uiesc )?nput following are state and timing descriptions for the system quiesced (sys_q uiesc ) input signal which the system uses to indicate to the 601 that it is ready to enter the soft-stop state. state meaning asserted?nables soft stop in the 601. negated?he soft-stop state is not enabled in the 601. systems that do not use sys_q uiesc should tie it low. timing comments assertion/negation?ust meet setup and hold times described in the powerpc 601 risc microprocessor hardware speci?ations . 2.11.3 resume (resume)?nput following are state and timing descriptions for the resume input signal, which the system uses to indicate to the 601 that it can resume normal operations. state meaning asserted?he 601 can resume normal operations after a soft stop. negated?he 601 cannot resume normal operations if a soft stop has occurred. systems that do not use this signal should tie it low. timing comments assertion?an occur any time. if asserted asynchronously to the 601 input clock, it must be asserted for at least three clock cycles. if asserted synchronously, it must be asserted at least two clock cycles. negation?an occur after the minimum pulse width has been met. 2-32 powerpc microprocessor family: the bus interface for 32-bit microprocessors 2.11.4 quiescent request ( qreq )?utput following are state and timing descriptions for the quiescent request (qreq ) output signal, which the 603 uses to request that the system enter quiescent state. state meaning asserted?he 603 is requesting all bus activity normally required to be snooped to terminate or to pause so the 603 may enter a low- power (nap or sleep) state. once the 603 has entered this state it no longer snoops bus activity. negated?he 603 is not requesting to enter the quiescent state. timing comments assertion?an occur at any time to indicate the request to enter the quiescent state, during which the 603 keeps asserting qreq . negation?an occur whenever quiescent state is not requested. 2.11.5 quiescent acknowledge (qa ck )?nput following are state and timing descriptions for the quiescent acknowledge (qa ck ) input, which the system uses to indicate to the 603 that it is ready to enter a low-power state. state meaning asserted?ll bus activity that requires snooping has terminated or paused so the 603 can enter a low-power state. negated?he 603 cannot enter a low-power state. timing comments assertion/negation?ay occur on any cycle after the assertion of qreq and must be held for a minimum of one bus clock cycle. start-up?a ck is sampled at the negation of hreset to select the reduced-pinout mode; if qa ck is asserted at start-up, reduced- pinout mode is disabled. see section 6.4, ?educed-pinout mode (603),?for a description of the reduced pinout mode. 2.11.6 halted (halted)?utput following are state and timing descriptions for the halted output signal which the 604 uses to indicate to other system components that the processor has been halted. state meaning asserted?he 604 enters idle state as a result of the nap mode. dispatch and execution stops and the processor bus is idle. negated?he processor is not in the idle state. timing comments assertion/negation?ynchronous with the processor clock. 2.11.7 run (run)?nput following are state and timing descriptions for the run input signal, which is used to notify the 604 that snooping is required. chapter 2. signal descriptions 2-33 state meaning asserted?orces the internal processor clocks to continue running, even if nap mode is active, allowing bus snooping to occur. halted is deasserted to indicate any bus activity and is reasserted to indicate when the processor is idle and when run can be deasserted. negated?nternal processor clocks can stop running in nap mode. timing comments assertion?ay occur at any time asynchronously to the input clocks. the maximum latency between run being asserted and the starting of the internal processor clocks is three bus clock cycles. negation?an occur after the halted signal is asserted. 2.11.7.1 going from normal to doze state (604e) the only state transition allowed from the normal state is to doze state. this transition requires system support. the system must assert run for at least 10 bus cycles before the software power management sequence can begin. run does not affect 604e operation in the normal state, but does affect operation during the transition from normal to doze state. the software power management sequence is the following code: sync mtmsr isync branch to the sync instruction the mtmsr instruction should modify the power management bit msr[pow] only. all other msr values such as the external interrupt enable should be set up before the software power management sequence is begun. when mtmsr is executed, the processor waits for its internal state to be idle and then asserts halted, at which point the processor is in doze state. when entering doze state, the system must assert run for at least 10 bus cycles after halted is asserted. when the processor is in doze state, halted is deasserted when a snoop-triggered write-back is in progress. the system must keep run asserted whenever halted is deasserted in doze mode due to a snoop write-back operation. if the software power management sequence is initiated from the normal state with run not asserted, the processor would attempt to go directly to nap state. this transition is not supported and may cause the system to hang later when the processor leaves nap state. 2.11.7.2 going from doze to nap state for the processor to go from doze to nap state, the system must ?st ensure that the bus is idle and that halted is asserted for at least 10 bus cycles. the system should then deassert run and continue to prevent bus grants for at least 10 additional bus cycles, at which point the processor is in nap state and bus transactions can be resumed. the processor does not snoop any subsequent bus transactions. in going from doze to nap state, the 604e must see the bus idle, which here means that the 604e cannot receive any ts or xa ts assertions. the system can ensure this by negating address bus grants to other bus devices. 2-34 powerpc microprocessor family: the bus interface for 32-bit microprocessors 2.11.7.3 going from nap to doze state for the processor to go from nap to doze state, the system should ensure the bus is idle for at least 10 bus cycles, assert run, and withhold bus grants for at least 10 additional bus cycles. at this point the processor is in the doze state and all bus transactions are snooped. 2.12 summary of signal differences table 2-8 lists each signal and describes any substantive differences between different implementations. the clock, power, and test signals are not described in this document. refer to the user s manual for the particular processor for this information. table 2-8. processor bus signal differences signal(s) difference address bus arbitration signals bus request (br ) as an output, assertion occurs when a bus transaction is needed and the device does not have a quali?d bus grant. this may occur even if the maximum (two for the 601 and 603, three for the 604) possible pipeline accesses have occurred . for the 603, br is asserted for one cycle during the execution of dcbz or of a load instruction that hit in the touch load buffer. bus grant (bg ) the 601 recognizes a quali?d bus grant on the cycle after aa ck even if ar tr y is asserted as long as the 601 is asserting ar tr y and has exclusive ownership of the data associated with the snoop which caused the ar tr y . address bus busy (abb ) address transfer start signals transfer start (ts ) output?01 and 603?igh impedance occurs one bus clock cycle after ts is negated which is coincident with the negation of abb . 604?igh impedance occurs one bus clock cycle after the negation of ts . for the 604, negation is only one bus cycle long, regardless of the ts -to-aa ck delay. extended address transfer start (xa ts ) later generations of the 603 do not support direct-store operations. output?01/603: high impedance occurs one bus clock cycle after xa ts is negated which is coincident with the negation of abb . 604: high impedance occurs one bus clock cycle after negation of xa ts . negation lasts only one bus cycle regardless of the xa ts -to-aa ck delay. address transfer signals address bus (a[0?1]) 603/604?or bursts, the address presented is double-word?ligned. 601?he address presented is quad-word?ligned. address parity (ap[0?]) 601?f address parity check is enabled in the hid0 register, detection of even parity unconditionally causes a checkstop. address parity error (ape ) address transfer attribute signals transfer type (tt[0?]) exact meanings of tt[0?] vary among processors. tt4 is output-only on the 601. transfer burst (tbst ) chapter 2. signal descriptions 2-35 transfer size (tsiz[0?]) transfer code (tc n ) the 601 and 603 support only tc[0?]. the 604 supports tc[0?]. the exact meanings of these signals vary from processor to processor. cache inhibited (ci ) write through (wt ) global (gbl ) output?03/604: negated on instruction fetches. 604e: hid0[23] controls gbl for instruction fetches through the address translation mechanism. input?he 603 must snoop the reservation address register for global and nonglobal address transfers because lwarx / stwcx. require snoops on castouts and snoop pushes (nonglobal). snoops with gbl = 1 do not affect cache state. cache set element (cse n ) the number of cse signals corresponds to the cache structure. 601: cse[0?]; 603 (not the 603e): cse; 603e/604: cse[0?]. cse signals are not meaningful during data cache touch load operations on a 603. high-priority snoop request (hp_snp_req ) 601 only address transfer termination signals address acknowledge (aa ck ) input?he 604 supports sampling ar tr y as early as the second cycle after ts . address retry (ar tr y ) negation timing is processor speci?. shared (shd ) the 603 does not support shared data. data bus arbitration signals data bus grant (dbg ) some conditions do not apply to the 604/604e for data streaming mode. data bus write only (dbw o ) data bus busy (dbb ) data transfer signals data bus (dh[0?1];dl[0?1]) data bus parity (dp[0?]) for the 601, if data parity check is enabled in the hid0 register, detection of even parity unconditionally causes a checkstop in the 601. data parity error (dpe ) data bus disable (dbdis ) signal de?ed after 601. data transfer termination signals transfer acknowledge (t a ) input/negation?hen the 603 is con?ured for 1:1 clock mode and is performing a burst read into data cache, the 603 requires one wait state between the assertion of ts and the ?st assertion of t a for that transaction. if no-dr tr y mode is also selected, the 603 requires two wait states. table 2-8. processor bus signal differences (continued) signal(s) difference 2-36 powerpc microprocessor family: the bus interface for 32-bit microprocessors data retry (dr tr y ) input/start-up?sed at power-on to select no-dr tr y mode for 603, data streaming mode for 604, and data streaming mode or no-dr tr y mode for 604e. for 603 and 604, dr tr y is sampled at the negation of hreset ; if dr tr y is asserted, no-dr tr y mode is selected (603/604e). if dr tr y is negated at start- up, dr tr y is enabled. if no-dr tr y or data streaming mode is selected, dr tr y must be negated during normal operation ( after hreset ). no-dr tr y mode provides a one-cycle faster reads; data streaming allows consecutive bursts. transfer error acknowledge (tea ) 604?n data streaming mode, the 604 does not recognize tea the cycle after t a during a read operation due to the absence of a dr tr y assertion opportunity. tea should be asserted for one cycle only. system status signals interrupt (int ) 601?nt may be negated after a minimum of three processor clock cycles. system management interrupt (smi ) supports the system management interrupt not de?ed by the powerpc architecture; not implemented on the 601. machine check interrupt (mcp ) this signal is not de?ed for the 601. checkstop input (ckstp_in ) early versions of the 603 identi?d this signal as ckstp . checkstop output (ckstp_out ) early versions of the 603 identi?d this signal as checkst op . hard reset (hreset ) after assertion, output drivers are released to high impedance within ?e clocks (three clocks for the 601) after the assertion of hreset . soft reset (sreset ) negation may occur any time after the minimum soft reset pulse width of 2 (10 for the 601) bus cycles has been met. processor state signals reservation (rsr v ) 604/604e. rsr v is asserted as late as the fourth cycle after aa ck for a read- atomic operation if the lwarx instruction requires a read-atomic operation. external cache intervention (l2_int) new feature on 604 time base enable (tben) time base did not exist on 601. tlbi synchronization (tlbisync ) supports a 603-speci? instruction; used at power-on to select 32-bit bus mode power management signals quiescent request (quiesc_req) 601 only quiescent request (qreq ) 603 only. this signal is used at power-on to select a reduced pin mode. halted (halted) power management for the 604 system quiesced (sys_q uiesc ) 601 only resume (resume) 604 only quiescent acknowledge (qa ck ) 603 only run (run) 604 only table 2-8. processor bus signal differences (continued) signal(s) difference chapter 3. memory access protocol 3-1 chapter 3 memory access protocol 30 30 memory accesses can occur in single (1? bytes) and four-beat (32 bytes) burst data transfers. system components can direct these accesses to the system memory hierarchy or to i/o devices as memory-mapped i/o. the address and data buses are decoupled for memory accesses to support pipelining and split transactions. the powerpc 601 and 603 processors can pipeline as many as two transactions; the powerpc 604 processor can pipeline as many as three. these processors have limited support for out-of-order split transactions. access to the system interface is granted through an external arbiter that lets devices compete for bus mastership. this mechanism is ?xible, allowing the processor to be integrated into systems that implement various fairness and bus-parking procedures to reduce arbitration overhead. the 601 and 604 provide multiprocessor support through coherency mechanisms that provide snooping, external control of the on-chip cache and translation lookaside buffers (tlbs), and support for a secondary cache. multiprocessor software support is provided through the use of atomic memory operations. typically, memory accesses are weakly-ordered?equences of operations, including load/store string and load/store multiple instructions, do not necessarily complete in the order they begin?aximizing the ef?iency of the bus without sacri?ing coherency of the data. the processors allow read operations to precede store operations (except where a dependency exists). a processor can be signaled to perform a pending write ahead of pending reads. the 604 performs snoop push operations ahead of all other bus operations. because the processor can dynamically optimize run-time ordering of load/store traf?, overall performance is improved. the synchronize ( sync ) or enforce in-order execution of i/o ( eieio ) instructions can be used to enforce strong ordering. the following sections describe how the processor interface operates, provide detailed timing diagrams that illustrate how the signals interact, and include a collection of more general timing diagrams of typical bus operations. 3-2 powerpc microprocessor family: the bus interface for 32-bit microprocessors figure 3-1 is a legend of conventions used in the timing diagrams. figure 3-1. timing diagram legend signals on this interface are synchronous?ll processor input signals are sampled and output signals are driven on the rising edge of the bus clock cycle (see the processor hardware speci?ations for exact timing information). 3.1 bus protocol figure 3-2 shows the memory access bus protocol for the 601, 603, and 604. memory accesses are divided into address and data tenures, each of which is comprised of three phases?us arbitration, transfer, and termination. address and data tenures are independent and, as indicated in figure 3-2, can overlap due to the ability to start a data tenure before the address tenure ends. the independence of these operations permits address pipelining and split-bus transactions to be implemented at the system level. these bar over signal name indicates active low ap0 b r a d d r + qual bg processor input (while processor is a bus master) processor output (while processor is a bus master) processor output (grouped: here, address plus attributes) internal signal (inaccessible to the user but used to clarify operations) compelling dependency?vent will occur on the next clock cycle prerequisite dependency?vent will occur on an undetermined, subsequent clock cycle processor three-state output or input processor nonsampled input signal with sample point a sampled condition (dot on high or low state) with multiple dependencies timing for a signal had it been asserted (it is not actually asserted) chapter 3. memory access protocol 3-3 processors support one- and four-beat transfers. figure 3-2 shows a data transfer that consists of a single-beat transfer of as many as 64 bits. four-beat burst transfers of 32-byte cache blocks are also supported, and on the 603, eight-beat bursts can be used to transfer an eight-word cache block when the processor is operating in 32-bit data bus mode. burst operations require data transfer termination signals for each beat. address-only transactions are used to broadcast synchronizing and cache control operations, especially in multiprocessor systems. figure 3-2. overlapping tenures on the processor bus for a single-beat transfer basic functions of the address and data tenures are as follows: address tenure arbitration: during arbitration, address bus arbitration signals are used to gain mastership of the address bus. transfer: after mastership is obtained, the address bus master transfers the address, transfer attributes, and parity information on the address bus. address signals and transfer attribute signals control the address transfer. address parity and address parity error signals ensure the integrity of the address transfer. termination: after the address transfer, the system signals that the address tenure is complete or that it must be repeated. data tenure arbitration: to begin a data tenure, the master arbitrates for data bus mastership. transfer: after mastership is obtained, the data bus master samples the data bus for read operations or drives the data bus for write operations. the data parity and data parity error signals ensure the integrity of the data transfer. termination: data termination signals are required after each data beat in a data transfer. in single-beat transactions, data termination signals also indicate the end of the tenure, while in burst accesses, data termination signals apply to individual beats and indicate the end of the tenure only after the ?al data beat. arbitration transfer termination address tenure arbitration single-beat transfer termination data tenure independent address and data 3-4 powerpc microprocessor family: the bus interface for 32-bit microprocessors processors can generate address-only bus operations during execution of certain instructions (for example dcbz , sync , eieio , tlbie , and lwarx ). address-only operations are given more support on processors intended for multiprocessor systems. the ability to retry address tenures provides an ef?ient snooping protocol for maintaining coherency in systems with multiple memory systems (including caches). although address and data transfers are separate, there is no explicit tagging mechanism to associate a data transfer with its address transfer. addresses and data are generally transferred in the same order. however, the data bus write only (dbw o ) signal allows writes to transfer ahead of reads. the designer of a multiple processor system can provide any ordering, as long as each processor transfers its addresses and data in same order and memory is kept coherent. 3.1.1 arbitration signals arbitration for both address and data bus mastership is performed by a central, external arbiter and minimally by the arbitration signals shown in section 2.1, address bus arbitration signals,?and section 2.6, ?ata bus arbitration signals.?most arbiter implementations require additional signals to coordinate bus master/slave/snooping activities. note that address bus busy (abb ) and data bus busy (dbb ) are bidirectional signals. they are processor inputs unless it is master of one or both buses; they must be connected high through pull-up resistors so that they remain negated when no devices have control of the buses. table 3-1 shows the bus arbitration signals. address bus arbitration signals are described as follows: ?r (bus request)?ssertion indicates that a device wants address bus mastership. ?g (bus grant)?ssertion indicates the device can, with the proper quali?ation, take mastership of the address bus. see section 2.1.2, ?us grant (bg)?nput. abb (address bus busy)?ssertion identi?s the address bus master. table 3-1. number of bus arbitration signals signal i/o signal connection requirements br output one per master bg input one per master abb both common among masters dbg input one per master dbw o input one per processor dbb both common among masters (one per master if data streaming is used across multiple masters) chapter 3. memory access protocol 3-5 data bus arbitration signals are described as follows: dbg (data bus grant)?ndicates that the device can, with the proper quali?ation, take data bus mastership. see section 2.6.1, ?ata bus grant (dbg)?nput. dbw o (data bus write only)?ssertion indicates that the processor may perform the data bus tenure for an outstanding write address even if a read address is pipelined before the write address. dbb (data bus busy)?ssertion indicates that the device is data bus master. processors assume data bus mastership if they need the data bus and are given a quali?d data bus grant. note that when the 604 uses data streaming, dbb works only as an output and is driven in the same manner as before. if 604 systems use data streaming across multiple devices, dbb must not be common among processors to avoid contention problems when one processor negates dbb while another asserts it. 3.1.2 address pipelining and split-bus transactions this protocol provides independent address and data bus capability to support pipelined and split-bus transaction system organizations. pipelining allows the address tenure of a bus transaction to begin before the data tenure of the previous transaction ?ishes. split-bus transactions allow other bus activity to occur (either from the same or from different devices) between the address and data tenures of a transaction. although it does not inherently reduce memory latency, address pipelining and split-bus transactions can greatly improve bus/memory throughput, and are especially effective in multiprocessor implementations where bus bandwidth is an important measurement of system performance. the design of the external arbiter affects pipelining by regulating address bus grant (bg ), data bus grant (dbg ), and address acknowledge (aa ck ) signals. for example, a one-level pipeline is enabled by asserting aa ck to the current address bus master and granting address bus mastership to the next requesting device before the current data bus tenure completes. for example, a two-level pipeline lets two additional address tenures occur before the current data bus tenure completes. the 604 can pipeline its transactions to a depth of two levels (intraprocessor pipelining) and the 601 and 603 can pipeline transactions to a depth of one level. the bus protocol does not limit the levels of pipelining between multiple devices (interprocessor pipelining); the external arbiter controls pipeline depth and synchronization between masters and slaves. in a pipelined implementation, data bus tenures stay in strict order with respect to address tenures except when dbw o is used to move write data tenures ahead of read data tenures. however, external hardware can further decouple the address and data buses, allowing data tenures to occur out of order with respect to address tenures. this requires some form of 3-6 powerpc microprocessor family: the bus interface for 32-bit microprocessors system tag to associate an out-of-order data transaction with its address transaction (not de?ed for this processor interface). each processor s bus requests and data bus grants can be used to implement tags to support interprocessor, out-of-order transactions. 3.2 address bus tenure this section describes the three phases of the address tenure?ddress bus arbitration, address transfer, and address termination. 3.2.1 address bus arbitration when a device needs bus access but does not have a quali?d bus grant, it asserts br until the bus is available and the device is granted mastership. the external arbiter must grant master-elect status to the potential master by asserting b g . the device requesting the bus determines that the bus is available as a quali?d bus grant; refer to section 2.1.2, ?us grant (bg)?nput.?the processor assumes address bus mastership and asserts abb when it receives a quali?d bus grant as shown in figure 3-3. figure 3-3. address bus arbitration showing qualified bus grant external arbiters must allow only one device at a time to be address bus master. in systems in which no other device can be a master, bg can be grounded (always asserted) to continually grant address bus mastership to the processor. -1 0 1 n e e d _ b u s logical bus clock br bg abb artry qualified bg abb chapter 3. memory access protocol 3-7 figure 3-4 shows bus parking; a quali?d bus grant exists on the clock edge following a need_bus condition eliminating the two bus clock cycles required for arbitration. the processor negates abb for at least one bus clock cycle after aack is asserted, even if it is parked and another transaction is pending. typically, the most recent bus master remains parked; however, system designers can choose other schemes, such as providing unrequested bus grants in situations where it is easy to correctly predict the next device requesting bus mastership. figure 3-4. address bus arbitration showing bus parking when the processor receives a quali?d bus grant, it assumes address bus mastership by asserting abb and negating the br output signal. meanwhile, the processor drives the requested address onto the address bus and asserts ts to indicate the start of a new transaction. to avoid the bus hogging these processors, always assert abb and ts simultaneously and negate abb the clock cycle following assertion of aa ck ; however, the processors accommodate systems in which abb is asserted early or removed late. when designing external bus arbitration logic, note that the processor may assert br without using the bus after it receives the quali?d bus grant. for example, if the 604 snoops an access that cancels the reservation associated with a queued read-with-intent-to-modify-atomic (rwitma) operation and for which it has asserted br , when the 604 is granted the bus, it no longer needs to perform the rwitma operation; therefore, the 604 does not assert abb and does not use the bus for the read operation. the 604 asserts br for at least one clock cycle in these instances. -1 0 1 n e e d _ b u s br bg abb artry qualified bg abb 3-8 powerpc microprocessor family: the bus interface for 32-bit microprocessors 3.2.2 address transfer during an address transfer, the physical address and transfer attributes pass from the bus master to the slave device(s). snooping logic may monitor the transfer to enforce cache coherency. the signal groups used in address transfers include the following: address transfer start signal?ransfer start (ts ). see section 2.2, address transfer start signals.? address transfer signals?ddress bus (a[0?1]), address parity (ap[0?]), and address parity error (ape); see section 2.3, address transfer signals. address transfer attribute signals?ransfer type (tt[0?]), transfer burst (tbst), transfer size (tsiz[0?]), transfer code (tc n ), cache inhibit (ci), write-through (wt ), global (gbl ), and cache set element (cse n ); see section 2.4, address transfer attribute signals. figure 3-5 shows the timing for all of these signals. except for ts and ape , address transfer and address transfer attribute signal timing is identical. these signals are represented by the line labeled addr+? asserting ts indicates the master has begun an address transfer and that the address and transfer attributes are valid (within the context of a synchronous bus). these processors always assert ts coincident with abb . as an input to the processors from other system masters, ts need not coincide with assertion of abb , but can be asserted after it is asserted; these processors track this scenario correctly. figure 3-5. address bus transfer the address is transferred in bus clock cycles 1 and 2 (arbitration occurs in clock cycle 0). ts is asserted in clock cycle 1 and then negated. address and attribute signals are driven valid coincident with the asserting of ts and held until the address transfer ends. the processor asserts abb during the transfer. a a ck is asserted to the processor the cycle after assertion of ts (shown by the dependency line). this is the shortest duration of an address transfer; it can be extended by a slave delaying assertion of aa ck . 01234 q u a l i f i e d b g ts abb aack artry a d d r + chapter 3. memory access protocol 3-9 3.2.2.1 address bus parity the 60x processors always generate one bit of correct odd-byte parity for each of the four bytes of address when a valid address is on the bus. the calculated values are placed on the ap[0?] outputs when the processor is address bus master. if the processor is not the master, ts and gbl are asserted together, the transaction type is one the processor snoops, and the calculated values are compared with inputs ap[0?]. if address bus parity checking is enabled (refer to the hid description for each processor), a parity error causes a machine check (checkstop on the 601) if msr[me] is set or checkstop if it is cleared. if address bus parity checking is disabled, no action is taken. in either case, a pe is asserted if even parity is detected. the 603 does not assert ape if address parity checking is disabled. 3.2.2.2 address transfer attribute signals the address transfer attribute signals, tt[0?], t bst , tsiz[0?], and tc n , are fully described in section 2.4, address transfer attribute signals,?and are summarized below. 3.2.2.2.1 transfer type (tt[0?]) signals snooping logic should fully decode the transfer type signals if g bl is asserted. slave devices can sometimes use individual transfer type signals without fully decoding the group. table 2-1 describes encodings for the transfer type signals. 3.2.2.2.2 transfer size (tsiz[0?]) signals the tsiz[0?] signals indicate the size of the requested data transfer as shown in table 2-2. these signals can be used with tbst and a[29?1] to determine which portion of the data bus has valid data for a write transaction or which portion of the bus should contain valid data for a read transaction. in general, processors do not produce 5-, 6-, or 7-byte transfers. the 601 allows unaligned ?ating-point operations to produce 5-, 6-, or 7-byte transfers, but use of this feature is discouraged. the powerpc architecture allows storage combining, but it is not supported in the 601 and 603. the 604 combines only stores to adjacent aligned words resulting from a cache-inhibited store multiple word ( stmw ) instruction. these combined words are presented to the bus as a normal double-word store in memory order. storage combining of other sizes (for example, three adjacent half words to make a 6-byte transfer) are not implemented. coherency size is de?ed as 32 bytes (one cache block). data transfers that cross a 32-byte?ligned boundary must present a new address to the bus at that boundary (for coherency consideration) or must operate as noncoherent data with respect to the processor. 3-10 powerpc microprocessor family: the bus interface for 32-bit microprocessors 3.2.2.3 burst ordering during data transfers during burst data transfer operations for these processors, 32 bytes of data (one cache block) are transferred to or from the processor cache in a speci? order. burst transfers are always presented by the processor with a double-word?ligned address (in other words a[29?1] is 0b000). for burst reads, these processors request the critical double word. a memory controller must transfer this word ?st, followed by those words in increasing memory addresses, and wrapping around to the beginning of the cache block as required. table 3-2 describes burst read orderings. for burst writes, processors present the ?st address of the block (a[27?1] is 0b00000). 3.2.2.4 effect of alignment in data transfers this section describes the various combinations of transfer size, address, and byte lanes used by these processors. also shown is the difference in behavior of powerpc processors with an 8-byte data bus and the 603 with a 4-byte data bus mode. aligned transfers are those whose address is an integer multiple of the data s size. for example, a 4-byte transfer has an address of 0bx...xx00. the powerpc architecture allows ?xibility to handle alignment errors either in hardware or software (a program exception). see the user s manual for each processor. table 3-3 lists aligned transfers (shown by an a) generated by a powerpc processor with a 64-bit data bus. for example, 1-byte data is always aligned. the table also shows byte lanes used for a 4-byte word transfer, and that only two addresses are aligned. table 3-2. processor read burst ordering data transfer processor starting address: a[27?8] = 00 a[27?8] = 01 a[27?8] = 10 a[27?8] = 11 first data beat dw0 dw1 dw2 dw3 second data beat dw1 dw2 dw3 dw0 third data beat dw2 dw3 dw0 dw1 fourth data beat dw3 dw0 dw1 dw2 note: a[29?1] are always 0b000 for burst transfers by the processor. chapter 3. memory access protocol 3-11 table 3-3. aligned data transfers for 64-bit data bus transfer size tsiz[0?] a[29?1] data bus byte lane(s) 01234567 byte 0 0 1 000 a 0 0 1 001 a 0 0 1 010 a 0 0 1 011 a 0 0 1 100 a 0 0 1 101 a 0 0 1 110 a 0 0 1 111 a half word 0 1 0 000 a a 0 1 0 010 a a 0 1 0 100 a a 0 1 0 110 a a word 1 0 0 000 aaaa 1 0 0 100 aaaa double word 0 0 0 000 aaaaaaaa notes : a: byte lane used ? byte lane not used 3-12 powerpc microprocessor family: the bus interface for 32-bit microprocessors table 3-4 lists aligned transfers that can occur on the bus and are generated by a 603 in 32-bit data bus mode. note that the two aligned word transfers are always transferred on byte lanes 0? and that a double-word transfer takes two beats. the processors support misaligned memory operations to varying degrees, however, it is strongly recommended that software attempt to align code and data where possible. in particular, load/store multiple and load/store string instructions that generate misaligned accesses can greatly affect performance. misaligned memory transfers address memory that is not aligned to the size of the data being transferred (such as, a word read of an odd byte address, 0bx...x1). although most of these operations hit in the primary cache (or generate burst memory operations if they miss), the processor interface supports misaligned transfers. there are three approaches for handling these transfers depending upon the processor and the data bus width. table 3-4. aligned data transfers for 32-bit data bus transfer size required bus transfers tsiz [0?] a[29?1] data bus byte lane(s) 01234567 byte one access 0 0 1 000 a xxxx one access 0 0 1 001 a xxxx one access 0 0 1 010 a xxxx one access 0 0 1 011 axxxx one access 0 0 1 100 a xxxx one access 0 0 1 101 a xxxx one access 0 0 1 110 a xxxx one access 0 0 1 111 axxxx half word one access 0 1 0 000 a a xxxx one access 0 1 0 010 aaxxxx one access 0 1 0 100 a a xxxx one access 0 1 0 110 aaxxxx word one access 1 0 0 000 aaaaxxxx one access 1 0 0 100 aaaaxxxx double word first access 0 0 0 000 aaaaxxxx second access 0 0 0 000 aaaaxxxx notes : a: byte lane used x: byte lane not used in 32-bit mode ? byte lane not used chapter 3. memory access protocol 3-13 the 601 transfers misaligned data in one or two bus cycles, as shown in table 3-5. misaligned data that does not cross a double-word boundary is transferred in a single access. those that cross a double-word boundary take two accesses. misaligned double-word ?ating-point loads and stores are outside the architecture and are not shown in this table even though they are supported by the 601. table 3-5. misaligned data transfers for the powerpc 601 processor transfer size required bus transfers tsiz[0?] a[29?1] data bus byte lanes 01234567 two bytes one access 0 1 0 0 0 1 a a one access 0 1 0 0 1 1 a a one access 0 1 0 1 0 1 a a first access 0 0 1 1 1 1 a second access 0 0 1 0 0 0 a three bytes one access 0 1 1 0 0 0 a a a one access 0 1 1 0 0 1 a a a one access 0 1 1 0 1 0 a a a one access 0 1 1 0 1 1 a a a one access 0 1 1 1 0 0 a a a one access 0 1 1 1 0 1 a a a first access 0 1 0 1 1 0 a a second access 0 0 1 0 0 0 a first access 0 0 1 1 1 1 a second access 0 1 0 0 0 0 a a four bytes one access 1 0 0 0 0 1 aaaa one access 1 0 0 0 1 0 aaaa one access 1 0 0 0 1 1 aaaa first access 0 1 1 1 0 1 a a a second access 0 0 1 0 0 0 a first access 0 1 0 1 1 0 a a second access 0 1 0 0 0 0 a a first access 0 0 1 1 1 1 a second access 0 1 1 0 0 0 a a a notes: a: byte lane used; : byte lane not used 3-14 powerpc microprocessor family: the bus interface for 32-bit microprocessors the 603 and 604 transfer misaligned data in one or two bus accesses, as shown in table 3-6. as long as the misaligned transfer does not cross a word boundary, these processors can transfer the data for the misaligned address in one access. the two-byte transfer at address 0bx...x001 is such a case. an attempt to address misaligned data that crosses a word boundary requires two bus transfers to access the data. table 3-6. misaligned data transfers for powerpc 603/ 604 processors transfer size required bus transfers tsiz[0?] a[29?1] data bus byte lanes 01234567 tw o bytes one access 0 1 0 0 0 1 a a first access 0 0 1 0 1 1 a second access 0 0 1 1 0 0 a one access 0 1 0 1 0 1 a a first access 0 0 1 1 1 1 a second access 0 0 1 0 0 0 a three bytes one access 0 1 1 0 0 0 a a a one access 0 1 1 0 0 1 a a a first access 0 1 0 0 1 0 a a second access 0 0 1 1 0 0 a first access 0 0 1 0 1 1 a second access 0 1 0 1 0 0 a a one access 0 1 1 1 0 0 a a a one access 0 1 1 1 0 1 a a a first access 0 1 0 1 1 0 a a second access 0 0 1 0 0 0 a first access 0 0 1 1 1 1 a second access 0 1 0 0 0 0 a a notes: a: byte lane used; : byte lane not used chapter 3. memory access protocol 3-15 in 32-bit data bus mode, the 603 transfers misaligned data in one or two bus cycles using only byte lanes 0?, as shown in table 3-7. if the attempted transfer does not cross a word boundary, the processor can transfer the data for the misaligned address in one access. the two-byte transfer at address 0bx...x001 is such a case. accessing data that crosses a word boundary, such as a two-byte transfer at address 0bx...x011, takes two bus transfers. four bytes first access 0 1 1 0 0 1 a a a second access 0 0 1 1 0 0 a first access 0 1 0 0 1 0 a a second access 0 1 0 1 0 0 a a first access 0 0 1 0 1 1 a second access 0 1 1 1 0 0 a a a first access 0 1 1 1 0 1 a a a second access 0 0 1 0 0 0 a first access 0 1 0 1 1 0 a a second access 0 1 0 0 0 0 a a first access 0 0 1 1 1 1 a second access 0 1 1 0 0 0 a a a table 3-6. misaligned data transfers for powerpc 603/ 604 processors (continued) transfer size required bus transfers tsiz[0?] a[29?1] data bus byte lanes 01234567 notes: a: byte lane used; : byte lane not used 3-16 powerpc microprocessor family: the bus interface for 32-bit microprocessors table 3-7. misaligned data transfers for 603 in 32-bit mode transfer size required bus transfers tsiz[0?] a[29?1] data bus byte lanes 01234567 tw o bytes one access 0 1 0 0 0 1 a a xxxx first access 0 0 1 0 1 1 axxxx second access 0 0 1 1 0 0 a xxxx one access 0 1 0 1 0 1 a a xxxx first access 0 0 1 1 1 1 axxxx second access 0 0 1 0 0 0 a xxxx three bytes one access 0 1 1 0 0 0 a a a xxxx one access 0 1 1 0 0 1 aaaxxxx first access 0 1 0 0 1 0 aaxxxx second access 0 0 1 1 0 0 a xxxx first access 0 0 1 0 1 1 axxxx second access 0 1 0 1 0 0 a a xxxx one access 0 1 1 1 0 0 a a a xxxx one access 0 1 1 1 0 1 aaaxxxx first access 0 1 0 1 1 0 aaxxxx second access 0 0 1 0 0 0 a xxxx first access 0 0 1 1 1 1 axxxx second access 0 1 0 0 0 0 a a xxxx four bytes first access 0 1 1 0 0 1 aaaxxxx second access 0 0 1 1 0 0 a xxxx first access 0 1 0 0 1 0 aaxxxx second access 0 1 0 1 0 0 a a xxxx first access 0 0 1 0 1 1 axxxx second access 0 1 1 1 0 0 a a a xxxx first access 0 1 1 1 0 1 aaaxxxx second access 0 0 1 0 0 0 a xxxx first access 0 1 0 1 1 0 aaxxxx second access 0 1 0 0 0 0 a a xxxx first access 0 0 1 1 1 1 axxxx second access 0 1 1 0 0 0 a a a xxxx notes: a: byte lane used; x: byte lane not used in 32-bit mode; ? byte lane not used in transfer chapter 3. memory access protocol 3-17 3.2.2.4.1 alignment of external control instructions the eciwx and ecowx instructions always transfer four bytes of data. however, if the eciwx or ecowx addresses data that crosses a double-word boundary on the 601 or any word boundary on the 603 or 604, the processor generates two bus operations, each transferring fewer than four bytes. for the ?st bus operation, bits a[29?1] equals ea[29?1] of the instruction, (0b101, 0b110, or 0b111 for the 601 or 0bx01, 0bx10, or 0bx11 for the 603 or 604). the size associated with the ?st bus operation is 3, 2, or 1 bytes, respectively. for the second bus operation, the system must determine how many bytes were transferred on the ?st bus operation to determine the size of the second operation. address bits a[29?1] equal 0b000 and the operation transfers 1, 2, or 3 bytes, respectively. for both operations, tbst and tsiz[0?] are rede?ed to specify the resource id (rid), copied from ear[28?1]. for eciwx / ecowx operations, the state of ear[28] is presented by the tbst signal without inversion (if ear[28] = 1, tbst is asserted). furthermore, the two bus operations associated with such a misaligned external control instruction are not atomic. that is, the processor can initiate other types of memory operations between the two transfers. also, the two bus operations associated with a misaligned ecowx can be interrupted by an eciwx bus operation, and vice versa. the processor guarantees that the two operations associated with a misaligned ecowx cannot be interrupted by another ecowx operation; and likewise for eciwx . because a misaligned external control address is considered a programming error, the system may choose to assert tea or otherwise cause an exception when a misaligned external control bus operation occurs. 3.2.3 address transfer termination an address tenure is terminated when completed with the assertion of aa ck . the processor does not terminate the address transfer until the a a ck input is asserted; therefore, the system can extend the address transfer phase by delaying assertion of aa ck . the aa ck signal can be asserted as early as the bus clock cycle following ts (see figure 3-5), for a minimum address tenure of two bus cycles. note that aa ck must be asserted for only one bus clock cycle. the address transfer can be terminated with the requirement to retry if ar tr y is asserted any time during the address tenure and through the cycle following aa ck . if an address retry is required, the ar tr y response is asserted by a bus snooping device as early as the second cycle after t s is asserted. once asserted, ar tr y must remain asserted through the cycle after the assertion of aa ck . the assertion of ar tr y during the cycle after the assertion of aa ck is called a quali?d ar tr y . assertion of ar tr y during the address tenure is referred to as an early ar tr y . if the bus master recognizes an a r tr y and the data tenure has begun, it terminates the data tenure immediately even if data has been received. if the assertion of ar tr y is received 3-18 powerpc microprocessor family: the bus interface for 32-bit microprocessors up to or on the bus cycle following the ?st (or only) assertion of t a for the data tenure, the processor ignores the ?st data beat; if it is a load operation, it does not forward data internally to the cache and execution units. if the 604 is in fast-l2 mode, t a should not be asserted prior to the valid ar tr y cycle. if ar tr y is asserted after the ?st (or only) assertion of t a , improper operation of the bus interface may result. as a bus master, the processor responds to an assertion of ar tr y by aborting the bus transaction and re-requesting the bus. the assertion causes both the address and data tenures to be rerun. after recognizing an assertion of ar tr y and aborting a transaction, the processor may not run the same transaction the next time it is granted the bus. as a snooping device, the processor asserts ar tr y for a snooped transaction that hits modi?d data in the data cache that must be written back to memory, or if the snooped transaction could not be serviced. as shown in figure 3-6, a r tr y is asserted for one bus clock cycle, three-stated for half of the next bus clock cycle, driven high till the following bus cycle, and ?ally three-stated. section 2.5.2, address retry (artry)?utput, describes ar tr y timing for different processors. figure 3-6. snooped address cycle with ar tr y the snoop push window occurs two cycles after the assertion of aa ck . coherency protocol provides that only one device can get a snoop hit due to modi?d data for any given address tenure. if ar tr y is asserted during the cycle after the assertion of aa ck, then in the following cycle, no processor asserts br unless a snoop hit requires it to do a push. to guarantee that a snoop push gets an immediate opportunity to obtain the address bus, the external arbiter must grant the bus to the snooping device next. 12 34 5 6 7 8 ts abb addr aack ar tr y qual bg abb chapter 3. memory access protocol 3-19 a processor with a snoop hit that requires a push uses the window to request the address bus. after it gains the address bus, it uses the address and data tenures only to perform a push. in some cases, a processor may have a queued snoop push and receive a snoop hit that requires another push. the processor can use the window to perform the queued push and not queue the second push. if the processor is parked in the cycle after aa ck , a processor with a snoop does not generate a fast push; instead, it acts as if it were not parked. the shd signal can also be asserted either coincident with a r tr y or alone to indicate that another bus device has a copy of the requested data and that the requesting device should mark its corresponding cache block as shared (s). 3.3 data bus tenure this section describes the data bus arbitration, transfer, and termination phases, which are nearly identical to address tenure phases. 3.3.1 data bus arbitration data bus arbitration uses the data arbitration signal group?bg , dbw o , and dbb . additionally, the combination of ts and tt[0?] provides information about the data bus request to external logic. asserting t s is an implied data bus request; the arbiter must qualify ts with the transfer type (tt[0?]) encodings to determine if the current address transfer is an address-only operation (see table 2-1). if the data bus is needed, the arbiter grants data bus mastership by asserting d bg to the processor. as with the address bus arbitration phase, the processor must qualify d bg before assuming bus mastership, as described in section 2.6.1, ?ata bus grant (dbg)?nput.?as shown in figure 3-7, the processor asserts dbb on the bus clock cycle after recognition of a quali?d data bus grant. figure 3-7. data bus arbitration 0123 ts dbg db b dr tr y q u a l dbg dbb 3-20 powerpc microprocessor family: the bus interface for 32-bit microprocessors when a data tenure overlaps its associated address tenure, a quali?d ar tr y assertion coincident with a dbg signal does not result in data bus mastership (dbb is not asserted). because the processor can pipeline outstanding data tenures when a new address tenure is retried, the processor becomes data bus master to complete the previous transaction. 3.3.1.1 effect of ar tr y assertion on data transfer and arbitration on the powerpc 604 processor the system designer must de?e the beginning of the window in which the snoop response is valid and ensure that data is not transferred until one cycle before that window, or until the same cycle as the beginning of that window in fast-l2 mode. the processors support a snoop response window as early as two cycles after assertion of ts . in fast-l2 mode, data cannot be transferred earlier than the ?st cycle of the assertion of a r tr y . asserting a r tr y can invalidate a previous or current data transfer and terminate the data cycle, invalidate a quali?d data bus grant, or cancel a future data transfer. the possible scenarios are described as follows: if data is transferred (via assertion of t a ) two or more cycles before the beginning of the snoop window in the normal mode, or one or more cycles before the beginning of the snoop window in data streaming mode, then data is transferred too early to be cancelled by ar tr y . therefore, systems in which ar tr y can be asserted must not attempt data transfers (assert t a ) before this cycle. if data is transferred in the cycle before the beginning of the snoop response window, asserting a r tr y invalidates the data transfer in a similar fashion to assertion of dr tr y except that the data tenure is aborted rather than extended. if data streaming mode is active, data cannot be transferred in this cycle. if data is transferred in the ?st cycle of the snoop response window, asserting a r tr y invalidates the data transfer. this is like deasserting t a except that the data tenure is aborted instead of continued. if dbg has not been asserted, asserting ar tr y effectively negates the implied data bus request associated with the address transfer, and the processor does not expect a transfer. the system must not assert dbg for this transfer if any other processor data transfers are pending. if ar tr y is asserted during a data transfer, it is terminated after the ?st cycle of ar tr y assertion. therefore, a burst transfer can be cut short. asserting a r tr y in the same cycle as its corresponding dbg disqualifies the data bus grant in that cycle so the 604 cannot start a data transaction on the following cycle regardless of whether other data transactions are queued. however, on the cycle after the ar tr y assertion, the 604 responds to a quali?d data bus grant if it has queued data transactions. figure 3-8 shows a write address tenure that receives an ar tr y snoop response in the same cycle the system asserts dbw o and dbg (cycle 6) to grant the write data tenure before a previously-requested read data tenure. following the ar tr y assertion, the quali?d dbg assertion to the processor in cycle 7 is accepted for the read data tenure. chapter 3. memory access protocol 3-21 figure 3-8. qualified dbg generation following ar tr y 3.3.1.2 using the dbb signal the dbb signal should be connected between potential masters if data tenure scheduling is left to them. optionally, the memory system can schedule data tenures directly with dbg . however, the system can ignore d bb if it is not used as the ?al data bus allocation control between data bus masters and if the memory system can track the start and end of the data tenure. if dbb is not used to signal the end of a data tenure, dbg is asserted only to the next bus master on the cycle before the next bus master may actually begin its data tenure, rather than asserting it earlier (usually during another master s data tenure) and allowing d bb negation to be the ?al gating signal for a quali?d data bus grant. if the 604 is in data streaming mode, dbb is an output-only signal and is not sampled by the processor. even if dbb is ignored in the system, the processor always recognizes its own assertion of dbb (except in data streaming mode) and requires one cycle after data tenure completion to negate its own dbb before recognizing a quali?d data bus grant for the next data tenure. if dbb is not required, it must be connected to a pull-up resistor on the processor to ensure proper operation. if the multiple 604s perform data streaming, each processor s d bb should be connected to the memory arbiter. 12 system clock ts aa ck ar tr y master1 dbg dbw o qualified dbg internal data bus request dbb 34 5678 910 master 1 read master 1 write for read for read ar tr y , kills qdbg for write 3-22 powerpc microprocessor family: the bus interface for 32-bit microprocessors 3.3.2 data bus write only because of address pipelining, a processor can queue up to three (two for the 601 and 603) data tenures to perform when it receives a quali?d dbg . generally, data tenures should be performed in the order their address tenures were performed. however, the processor supports a limited out-of-order capability with the data bus write only (dbw o ) input. using d bw o can avoid deadlocks that can occur in certain system designs. when recognized on the clock of a quali?d dbg , dbw o can direct the processor to perform the next pending data write tenure even if a pending read tenure normally would have been performed ?st. see section 2.6.2, ?ata bus write only (dbwo)?nput. the processor always accepts data bus mastership to perform a pending data tenure when it recognizes a quali?d dbg . if dbw o is asserted with a quali?d dbg and no write tenure is queued, the 603 and 604 still take mastership of the data bus to perform the next pending read data tenure. if the processor has multiple queued writes, asserting d bw o reorders the write operation whose address was sent ?st. generally, dbw o should be used only to allow a copy-back operation (burst write) to occur before a pending read operation. if dbw o is used for single-beat write operations, it may negate the effect of the eieio instruction by allowing a write operation to precede a program-scheduled read operation. 3.3.3 data transfer data transfer signals include dh[0?1], dl[0?1], dp[0?], and dpe . the dh and dl signals form a 64-bit data path for read and write operations. the processor transfers data in either single- or four-beat burst transfers (eight-beat when the 603 is in 32-bit bus mode). single-beat operations transfer from one to eight bytes within a double word at a time and can be misaligned; see section 3.2.2.4, ?ffect of alignment in data transfers.?burst operations always transfer eight words and are aligned on eight-word address boundaries. burst transfers give signi?antly higher bus throughput than single-beat transfers. the type of transaction initiated by the processor depends on whether the code or data is cacheable and, for store operations, whether the memory accessed is marked write-back or write-through mode. software controls this mode on a page or block basis. burst transfers support cacheable operations only; that is, memory structures must be marked as cacheable (and write-back for data store operations) in the respective page or block descriptor to take advantage of burst transfers. the processor output tbst indicates to the system whether the current transaction is a single-beat or a burst transfer (except during eciwx / ecowx transactions, when it signals the state of ear[28]). a burst transfer has an assumed address order. for load or store operations that miss the cache and are marked cacheable (stores are also marked as write-back) in the mmu, the processor uses the double-word?ligned (quad-word?ligned for the 601) address associated with the critical code or data that initiated the transaction. chapter 3. memory access protocol 3-23 this minimizes latency by allowing critical data to be forwarded to the processor before the rest of the cache block is ?led. for all other burst operations, however, cache block transfers start with the oct-word?ligned data. bus masters including these processors may generate byte-wise odd parity for their outgoing data and drive this information onto the dp[0?] lines coincidentally with their data. the processors check this parity whenever they read data and assert the dpe signal and take a machine check or checkstop exception if an error is detected. parity checking can be disabled within the processors by setting a bit in the hid register. the processors do not directly support dynamic memory access interfacing to subsystems with less than a 64-bit data path. other system components must provide any required translation to devices with less than 64-bit data paths. the 601 provides limited data mirroring for noncachable transfers of less than a word. 3.3.4 data transfer termination data bus transactions can be terminated by one of the four signals, t a , dr tr y , tea , or ar tr y , which are described as follows: asserting t a indicates normal termination of data transactions. it must be asserted on the bus cycle coincident with the data it qualifies. the slave can withhold t a for any number of clock cycles until valid data is ready to be supplied or accepted. ?r tr y indicates invalid read data in the previous bus clock cycle. dr tr y extends the current data beat and does not terminate it. if it is asserted after the last (or only) data beat, the processor negates dbb but still considers the data beat active and waits for another assertion of t a . dr tr y is ignored on write operations. upon receiving a ?al (or only) termination condition, the processor negates dbb for one cycle, except when data streaming is used. if dr tr y is asserted to extend the last (or only) data beat past the negation of dbb , the memory system should three-state the data bus on the clock after the ?al assertion of t a , even though it negates dr tr y on that clock. this prevents a momentary data bus con?ct if a write access begins on the following cycle. asserting t ea signals a nonrecoverable error during a data transfer. it is recognized at any time during assertion of dbb or when a valid dr tr y could be sampled. asserting t ea ends the data tenure immediately even if it is in the middle of a burst; however, it does not prevent incorrect data that has been acknowledged with t a from being written into the processor s cache or gprs. asserting t ea causes either a machine check exception or a checkstop condition depending on the msr setting. asserting ar tr y for the address tenure associated with the current data tenure ends the data tenure immediately. it may not be due to address pipelining. if ar tr y is connected for the processor, the earliest allowable assertion of t a to the processor depends directly on the earliest possible assertion of ar tr y to the processor; see section 3.3.1.1, ?ffect of artry assertion on data transfer and arbitration on the powerpc 604 processor. 3-24 powerpc microprocessor family: the bus interface for 32-bit microprocessors 3.3.4.1 normal single-beat termination single-beat data read operations normally end when t a is asserted by a responding slave. figure 3-9 shows that t ea and dr tr y must remain negated during the transfer. figure 3-9. normal single-beat read termination normal termination of a single-beat data write transaction occurs when t a is asserted by a responding slave. tea must remain negated during the transfer. as shown in figure 3-10, dr tr y is not sampled during data writes. figure 3-10. normal single-beat write termination 12346 ts q u a l i f i e d dbg dbb data ta dr tr y aa ck 1234 ts q u a l i f i e d dbg dbb data ta dr tr y aa ck chapter 3. memory access protocol 3-25 normal burst transfer termination occurs when t a is asserted for four bus clock cycles, shown in figure 3-11. to pace data transfer beats, clock cycles in which t a is asserted need not be consecutive. to terminate read bursts, tea and dr tr y must remain negated during the transfer. for successful write bursts, dr tr y is ignored and tea must remain negated. figure 3-11. normal burst transaction for read bursts, dr tr y may be asserted one bus clock cycle after t a is asserted to signal that the associated data is invalid. it stays asserted until the cycle after valid data is sent by the slave (see figure 3-12). figure 3-12. termination with dr tr y ts q u a l i f i e d dbg dbb data ta dr tr y 1234 567 12 34 5 ts q u a l i f i e d dbg dbb data ta dr tr y 3-26 powerpc microprocessor family: the bus interface for 32-bit microprocessors thus, a data beat can be terminated speculatively with ta and con?med one bus clock cycle later by negating drtry (valid only for read transactions). ta must be asserted on the clock cycle before the ?st bus clock cycle of the assertion of drtry; otherwise results are unde?ed. asserting drtry extends data bus mastership such that no other processors can use the data bus until drtry is negated. therefore, in figure 3-12, dbb cannot be asserted until clock cycle 5. this is true for both read and write operations, although drtry is ignored by the processors for write operations. figure 3-13 shows the effect of using dr tr y during a burst read. it also shows the effect of using t a to pace the data transfer rate; in clock cycle 3, t a is negated for the second data beat. the processor data pipeline proceeds in clock cycle 4 when t a is reasserted. note that dr tr y is useful for systems that implement speculative data forwarding (for example, those with direct-mapped, second-level caches where hit/miss is determined on the following bus clock cycle) or for parity- or ecc-checked memory systems. figure 3-13 shows the data transferred in cycle 5 invalidated by the assertion of dr tr y in cycle 6. its negation in cycle 7 and 8 and the assertion of t a indicates valid data beats. figure 3-13. read burst with t a wait states and dr tr y 3.3.4.2 data transfer termination due to a bus error to indicate that a bus error occurred, tea can be asserted while dbb is asserted or when a valid dr tr y could be recognized by the processor. asserting tea to the processor terminates the transaction; that is, further assertions of t a and dr tr y are ignored and dbb is negated. if the system asserts tea for a data transaction on the same cycle or before ar tr y is asserted for the corresponding address transaction, the processor ignores the effects of ar tr y on the address transaction and considers it successfully completed. ts q u a l i f i e d dbg dbb data ta dr tr y 1234 567 89 chapter 3. memory access protocol 3-27 from a bus standpoint, asserting tea causes nothing worse than the early termination of the data tenure in progress. all the system logic involved in processing the data transfer prior to the tea must return to the normal nonbusy state following the tea so that the bus operations associated with a machine check exception can proceed. due to bus pipelining in the 604, all outstanding bus operations, including queued requests, complete in normal fashion following the assertion of tea. the machine check exception can be taken while these transactions are in progress. asserting t ea causes a machine check exception (and possibly a checkstop condition within the processor). see section 5.3.1, ?heckstop state (msr[me] = 0).?because these processors do not implement a synchronous error capability for memory accesses, the exception instruction pointer points not to the memory access that caused the assertion of tea but to the instruction about to be executed (perhaps several instructions later). however, assertion of tea does not invalidate data entering the gpr or the cache. additionally, the corresponding address of the access that caused tea to be asserted is not latched by the processor. to recover, the exception handler must either identify and correct the error that caused t ea to be asserted or the processor must be reset; therefore, this function should be used only to ?g fatal system conditions to the processor (such as parity or uncorrectable ecc errors). after the processor has committed to run a transaction, that transaction must eventually complete. address retry causes the transaction to be restarted. although, t a wait states and dr tr y assertion for reads delay termination of individual data beats, eventually the system must either terminate the transaction or assert t ea (to generate a machine check exception). therefore, software must check for the end of physical memory and the location of certain system facilities to avoid memory accesses that might cause tea to be asserted. if msr[me] is clear when tea is asserted, a true checkstop condition occurs (instruction execution halted and processor clock stopped); a machine check exception occurs if msr[me] is set. 3-28 powerpc microprocessor family: the bus interface for 32-bit microprocessors 3.4 timing examples this section shows timing diagrams for various scenarios. figure 3-14 illustrates the fastest single-beat reads possible for these processors, showing both minimal latency and maximum single-beat throughput. by delaying the data bus tenure, latency increases, but, because of split-transaction pipelining, the overall throughput is not affected unless the data bus latency causes the fourth (third for 601 and 603) address tenure to be delayed. note that all bidirectional signals are three-stated between bus tenures. figure 3-14. fastest single-beat reads br bg abb ts a[0?1] tt[0?] tbst gbl aa ck ar tr y dbg dbb d[0?3] t a dr tr y tea cpu a cpu a cpu a read read read in in in 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 chapter 3. memory access protocol 3-29 figure 3-15 shows the fastest single-beat writes supported by these processors. the tt[1?] signals are binary encoded 0bx0010 (tt0 can be either 0 or 1). figure 3-15. fastest single-beat writes br bg abb ts a[0?1] tt[0?] tbst gbl aa ck ar tr y dbg dbb d[0?3] t a dr tr y tea cpu a cpu a cpu a sbw sbw sbw out out out 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 3-30 powerpc microprocessor family: the bus interface for 32-bit microprocessors figure 3-16 shows three ways to delay single-beat reads showing data-delay controls: the t a signal can remain negated to insert wait states in clock cycles 3 and 4. for the second access, dbg could have been asserted in clock cycle 6. in the third access, dr tr y is asserted in clock cycle 11 to ?sh the previous data. note that all bidirectional signals are three-stated between bus tenures. figure 3-16. single-beat reads showing data-delay controls cpu a cpu a cpu a read read read in in bad in br bg abb ts a[0?1] tbst gbl aa ck ar tr y dbg dbb d[0?3] tt[0?] t a dr tr y tea 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 chapter 3. memory access protocol 3-31 figure 3-17 shows data-delay controls in a single-beat write operation. bidirectional signals are three-stated between bus tenures. data transfers are delayed in the following ways: ? a is held negated to insert wait states in clocks 3 and 4. in clock 6, dbg is held negated, delaying the start of the data tenure. the last access is not delayed (dr tr y is valid only for read operations). figure 3-17. single-beat writes showing data delay controls cpu a cpu a cpu a sbw sbw sbw out out out br bg abb ts a[0?1] tbst gbl aa ck ar tr y dbg dbb d[0?3] tt[0?] t a dr tr y tea 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 3-32 powerpc microprocessor family: the bus interface for 32-bit microprocessors figure 3-18 shows the use of data-delay controls with burst transfers. all bidirectional signals are three-stated between bus tenures. note the following: the ?st data beat of bursted read data (clock 3) is the critical double word (quad word for 601). the write burst shows the use of t a signal negation to delay the third data beat. the ?al read burst shows the use of dr tr y on the third data beat. the address for the third transfer is delayed until the ?st transfer completes. figure 3-18. burst transfers with data delay controls cpu a in cpu a cpu a read write read in in in out out out 2 out in in 1 in in in br bg abb ts a[0?1] tbst gbl aa ck ar tr y dbg dbb d[0?3] tt[0?] t a dr tr y tea 123456 7891011121314151617181920 123456 7891011121314151617181920 chapter 3. memory access protocol 3-33 figure 3-19 shows the use of the tea signal. note that all bidirectional signals are three-stated between bus tenures. note the following: the ?st data beat of the read burst (in clock 3) is the critical double word (quad word for 601). the tea signal cancels the burst write transfer on the third data beat. the processor eventually causes an exception to be taken on the tea event. figure 3-19. use of transfer error acknowledge (tea ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 cpu a in cpu a cpu a read write read in in in out out out 2 in in in in br bg abb ts a[0?1] tbst gbl aa ck ar tr y dbg dbb d[0?3] tt[0?] t a dr tr y tea 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 3-34 powerpc microprocessor family: the bus interface for 32-bit microprocessors chapter 4. memory coherency 4-1 chapter 4 memory coherency 40 40 this chapter describes hardware resources de?ed by the 60x bus de?ition that maintain memory coherency such that all devices that share memory in a system using a powerpc processor have an accurate view of memory. although the powerpc architecture memory model requires memory to be kept coherent, it does not de?e either the snooping protocol or the use of mesi coherency states commonly used on powerpc processors. the 60x processors provide resources that support memory coherency by snooping bus transactions. this chapter provides an overview of how the 60x processors implement the mesi protocol and the bus operations implemented by the 60x processors that ensure cache coherency. note that there are unique characteristics to the cache implementations of each of the powerpc processors, which are summarized in the following sections. 4.1 overview of cache implementations to support a wide variety of processor implementations, the cache model de?ed by the powerpc architecture is very ?xible. although it supports harvard architecture caches, that is separate instruction and data caches, this is not required. processor caches can vary greatly with respect to size, organization, and set-associativity, however, these considerations do not affect the bus design in a substantial way. for example, the number of sets in a processor s cache implementation determines the number of cache set element (cse n ) signals that must be implemented. major areas where a processor s cache structure affects the bus design are l2 cache support and the level of support given to multiprocessing concerns such as snooping, coherency- related bus operations, and mesi state logic. for example, the powerpc 603 processor is not optimized for use in multiprocessor systems and therefore does not support the shd bus signal or the shared (s) mesi state. differences in how processors implement coherency-related bus operations are described in section 4.10, ?verview of implementation differences. the following section provides an overview of the cache implementations in the powerpc 601, 603, and 604 processors. 4-2 powerpc microprocessor family: the bus interface for 32-bit microprocessors 4.1.1 powerpc 601 processor cache organization the 601 implements a single uni?d cache that is con?ured as eight sets of 64 lines, each consisting of two sectors, four state bits (two per sector), an address tag, and several bits to maintain the lru function. the two state bits implement the four-state mesi (modi?d- exclusive-shared-invalid) protocol. each sector contains eight 32-bit words. note that powerpc architecture de?es the cacheable unit as a block, which is a sector in the 601. to maintain the ?w of instructions through the instruction queue, the instruction unit accesses the cache frequently. the queue is eight words (one sector) long, so an entire sector can be loaded into the instruction unit on a single clock cycle. the cache organization is shown in figure 4-1. replacement strictly follows an lru algorithm; that is, the least-recently used sector is used, which may mean that a modi?d sector is replaced on a miss if it is the least-recently used, even if invalid sectors are available. however, for performance reasons, certain conditions (for example, the execution of some cache instructions) generate accesses to the cache without modifying the bits that perform the lru function. figure 4-1. powerpc 601 processor cache organization each cache block contains 16 contiguous words from memory that are loaded from a 16-word boundary (that is, bits a26?31 of the logical (effective) addresses are zero); as a result, cache lines are aligned with page boundaries. line 63 line 0 address tag address tag sector 0 sector 1 8 words 8 words 16 words 8 sets chapter 4. memory coherency 4-3 note that address bits a20?25 provide an index to select a line. bits a26?31 select a byte within a line. the tags consists of bits pa0?a19. address translation occurs in parallel, such that higher-order bits (the tag bits in the cache) are physical. 4.1.2 powerpc 603 processor cache organization the 603 has separate instruction and data caches. the organization of the 603 data and instruction caches is shown in figure 4-2. figure 4-2. powerpc 603 processor cache organization each cache block has eight contiguous words from memory that are loaded from an eight-word boundary, that is, bits a27?31 of the logical (effective) addresses are zero. as a result, cache blocks are aligned with page boundaries. address bits a20?26 provide an index to select a set. bits a27?31 select a byte within a block. the tags consist of bits pa0?a19. address translation occurs in parallel, such that higher-order bits (the tag bits in the cache) are physical. replacement strictly follows an lru algorithm; that is, the least-recently used block is updated on a cache miss. the 603 instruction cache, is like that of the data cache, although bits are not provided to maintain mei cache coherency. 4.1.3 powerpc 603e processor cache enhancements the 603e provides the following enhancements to the 603 cache implementation: the instruction cache is blocked only until the critical load completes (hit under reloads allowed). the critical double word is simultaneously written to the cache and forwarded to the requesting unit, thus minimizing stalls due to load delays. address tag 1 address tag 2 address tag 3 block 1 block 2 block 3 128 sets address tag 0 block 0 8 words/block state state state state words 0? words 0? words 0? words 0? 4-4 powerpc microprocessor family: the bus interface for 32-bit microprocessors provides for an optional data cache operation broadcast feature (enabled by the hid0[abe] bit) that allows for correct system management using an external copy- back l2 cache. optional broadcast of cache control instructions dcbi , dcbf , and dcbst through con?uration of hid0[abe] bit. 4.1.4 powerpc 604 processor cache organization the 604 cache implementation consists of separate 16-kbyte instruction and data caches (harvard architecture). the 604 instruction and data cache organization is shown in figure 4-3. figure 4-3. powerpc 604 processor cache organization both caches are four-way set associative and implement an lru replacement algorithm within each set. the cache directories are physically addressed with the physical (real) address tag stored in a cache directory. both the instruction and data caches have 32-byte cache blocks. the coherency state bits for each block of the data cache allow encoding for all four possible mesi states. the coherency state bit for each cache block of the instruction cache allows encoding for two possible states: invalid (inv) valid (val) each cache can be invalidated or locked by setting appropriate bits in the hardware implementation dependent register 0 (hid0). the 604 uses eight-word burst transactions to transfer cache blocks to and from memory. when requesting burst reads, the 604 presents a double-word?ligned address. memory controllers are expected to transfer this double word of data ?st, followed by double words from increasing addresses, wrapping back to the beginning of the eight-word block as address tag 1 address tag 2 address tag 3 block 1 block 2 block 3 128 sets address tag 0 block 0 8 words/block state state state state words 0? words 0? words 0? words 0? chapter 4. memory coherency 4-5 required. burst misses can be buffered into two eight-word line-?l buffers before being loaded into the cache. cache block writes for copy-back operations always present the ?st address of the block and transfer data beginning at the start of the block. however, this does not keep other masters from transferring critical double words ?st on the bus for writes. 4.1.5 powerpc 604e processor cache enhancements the 604e has separate 32-kbyte data and instruction caches. this is double the size of the 604 caches. the 604e caches are logically organized as a four-way set with 256 sets compared to the 604 s 128 sets. the physical address bits that determine the set are 19 through 26 with 19 being the most-signi?ant bit of the index. if bit 19 is zero, the block of data is an even 4-kbyte page that resides in sets 0?27; otherwise, bit 19 is one and the block of data is an odd 4-kbyte page that resides in sets 128?55. because the caches are four-way set-associative, the cache set element (cse[0?]) signals remain unchanged from the 604. figure 4-4 shows the organization of the 604e caches. figure 4-4. powerpc 604e processor cache organization 4.2 cache coherency overview a coherent memory system provides the same image of memory to all devices that share a system s memory. this is important for multiprocessor systems because it allows for synchronization, task migration, and the cooperative use of shared resources. an incoherent memory system could easily produce unreliable results depending on when and which processor executed a task. maintaining coherency is a concern primarily for data cache implementations. for example, if a processor does not have exclusive access to an addressed block before performing a store operation, another processor could have a copy of the old (or stale) data. two processors reading from the same memory location would get different data. address tag 1 address tag 2 address tag 3 block 1 block 2 block 3 address tag 0 block 0 8 words/block state state state state words 0? words 0? words 0? words 0? sets 0?27 (even pages) sets128?55 (odd pages) 4-6 powerpc microprocessor family: the bus interface for 32-bit microprocessors to maintain a coherent memory system, each processor follows simple rules for managing the cache state such as broadcasting its intention to read a cache block not in the cache and its intention to write into a block not owned exclusively. other devices respond by snooping the broadcast addresses and reporting cache status back to the originating processor. the status returned includes a shared indicator (the shd signal) and an address retry indicator (the ar tr y signal). the snooping processor asserts shd if it has a copy of the addressed block; it asserts ar tr y if it has a modi?d copy of the addressed cache block that must be written back to memory or if another processor had a problem that kept it from snooping the address. for additional information about snooping, see section 4.7.1, ?eneral comments on 60x snooping.? to maximize performance, the 601 and 604 provide a second path into the data cache directory for snooping that allows the mainstream instruction processing to operate concurrently with snooping. instruction processing is affected only when snoop-control logic requires a snoop push of modi?d data to maintain memory coherency. 4.3 memory coherency?esi protocol each cache block is in one of the four mesi states. addresses presented to the cache are indexed into the cache directory and are compared against the cache directory tags. if no tags match, the result is a cache miss. if a tag match occurs, a cache hit has occurred and the directory indicates the state of the block through three state bits kept with the tag. the four possible states for a cache block are invalid (i), shared (s), exclusive (e), and modi?d (m), which are de?ed in table 4-1 . table 4-1. mesi state definitions mesi state de?ition modi?d (m) the addressed block is valid in the cache and in only this cache. the block is modi?d with respect to system memory?hat is, the modi?d data in the block has not been written back to memory. note that some documentation identi?s this as xm (exclusive modi?d) state. exclusive (e) the addressed block is in this cache only. the data in this block is consistent with system memory. note that some documentation identi?s this as xu (exclusive unmodi?d) state. shared (s) the addressed block is valid in the cache and in at least one other cache. this block is always consistent with system memory. that is, the shared state is shared-unmodi?d; there is no shared- modi?d state. the 603, which is not optimized for multiprocessor implementations, does not support the shared (s) state. invalid (i) this state indicates that the addressed block is not resident in the cache and/or any data contained is considered not useful. chapter 4. memory coherency 4-7 figure 4-5 illustrates the basic relationships of the mesi states. figure 4-5. mesi states although memory space designated for instructions is rarely updated, data in memory space designated for data is continually being changed as the results from instruction execution are stored in memory. therefore, maintaining coherency in data caches requires greater hardware support. modified in cache a cache a cache b system memory cache a cache b system memory cache a cache b cache a cache b system memory valid data m data invalid/ not congruent shared in cache a valid data valid data ss valid data exclusive in cache a e valid data valid data don? care x invalid in cache a system memory don? care data invalid/ not congruent data invalid/ not congruent invalid data i 4-8 powerpc microprocessor family: the bus interface for 32-bit microprocessors the 604 and 601 have dedicated hardware to provide memory coherency by snooping bus transactions. the address retry capability enforces the four-state, mesi cache coherency protocol (see figure 4-6). figure 4-6. mesi cache coherency protocol (601/604)?tate diagram (wim = 001) the global (gbl ) output signal indicates whether the current transaction must be snooped by other devices. address bus masters assert gbl to indicate that the current transaction is a global access (that is, an access to memory shared by more than one device) and should be snooped. if gbl is not asserted for the transaction, that transaction is not snooped. normally, gbl re?cts the m-bit value speci?d for the memory reference in the corresponding translation descriptor(s). care must be taken to minimize the number of pages marked as global, because the retry protocol discussed in the previous section is used to enforce coherency and can require signi?ant bus bandwidth. shared shr rh rh exclusive shw rms shr shw shr rme wh wh wh rh modified shw shw (burst) invalid (on a miss, the old line is first invalidated and copied back if m) wm bus transactions rh = read hit = snoop push rms = read miss, shared rme = read miss, exclusive = invalidate transaction wh = write hit wm = write miss = read-with-intent-to-modify shr = snoop hit on a read shw = snoop hit on a write or = cache block fill read-with-intent-to-modify dcbst (604) chapter 4. memory coherency 4-9 when a processor is not the address bus master, gbl is an input. the 604 snoops a transaction if ts and gbl are asserted together in the same bus clock cycle (this is a quali?d snooping condition). no snoop update to the 604 cache occurs if the snooped transaction is not marked global. this includes invalidation cycles. when the processor detects a quali?d snoop condition, the address associated with the ts is compared against the data cache tags through a dedicated cache tag port. snooping completes if no hit is detected. if, however, the address hits in the cache, the processor reacts according to the mesi protocol shown in figure 4-6, assuming the wim bits are set to write-back mode, caching allowed, and coherency enforced (wim = 001). write hits to modi?d cache blocks of nonglobal pages do not generate invalidate broadcasts. several bus transactions involve moving data that can no longer access the tlb m bit (for example, replacement cache block copy-back or a snoop push). in these cases, because hardware cannot determine whether the cache block was originally marked global, the processor marks these transactions as nonglobal to avoid retry deadlocks. see table 4-2 for the cse[0?] encodings for the 604. 4.4 coherency timing 60x processors communicate the results of their snooping over the snoop response lines, ar tr y and shd . these signals are de?ed to be valid at least the cycle after assertion of aa ck . a 60x that tries to acquire a memory block is considered to have acquired it after it has successfully completed the address tenure requesting the data (no-ar tr y indication). after that cycle, it snoops for that address. likewise, a 60x processor that is ?shing data from its cache is considered to have completed the transfer from the standpoint of memory coherency after it has successfully completed the address tenure for the push or copy-back. once this has occurred, it no longer snoops for this address. note that this has implications to system design. for example, after a 60x pushes a cache block, it may be some time before the block is actually stored in memory. if a read of the same block occurs after the push address tenure is completed, it is not snooped by the 60x performing the push. the system must ensure that proper ordering is maintained so that the correct data is read. 4.5 coherency protocol the 60x bus supports a four-state (mesi) cache coherency protocol through the use of address retry (see figure 4-6). the 601 and 604 implement the protocol to the extent required to support multiprocessor systems. because it does not support the shared state, the 603 supports a three-state subset of the mesi protocol, (mei) protocol, which assures coherency in a a single-processor system. all references to the shared state do not apply to the 603. 4-10 powerpc microprocessor family: the bus interface for 32-bit microprocessors when the 60x is not the bus master, it monitors the bus. if g bl is asserted, the 601/604 snoop address transfers. due to the 603 lwarx / stwcx. implementation, which is based on the mei protocol, the bus is snooped regardless of the state of g bl . see section 8.8, ?warx/stwcx. considerations,?for more information. the 604 snoops its own nonglobal or global transfers (internally, not across the bus) in the case of the address-only operations, icbi, sync, tlbie, and tlbsync, and can assert a r tr y in response. normally, gbl re?cts the value of the m bit provided by the translation mechanism in the master processor. see section 4.8, ?xternal wim bit settings, for details of the conditions under which gbl does not re?ct the state of the m bit. figure 4-6 shows the mesi protocol implemented by the 601 and 604; figure 4-7 shows the mei cache coherency protocol for the 603. figure 4-7. mei cache coherency protocol (603)?tate diagram (wim = 001) rh wh rh modified bus transactions sh =snoop hit = snoop push rh =read hit rm =read miss wh =write hit = cache line fill wm=write miss sh/crw=snoop hit, cacheable read/write sh/cir=snoop hit, cache inhibited read wh sh sh/cir sh/crw wm exclusive invalid sh/crw rm chapter 4. memory coherency 4-11 4.5.1 powerpc 603 processor lwarx/stwcx. implementation due to the 603 s three-state mei protocol and absence of address-only broadcast transfers, the lwarx/stwcx. instruction pair is different from the 601 and 604. all global reads snooped by the 603 (except for a rwnitc) invalidate a cache block, and all reads originating from the 603 are rwitms (except for reads from cache-inhibited pages). therefore, a potential deadlock can occur if rwitms cancel a reservation as in the 601 and 604. the following operations are required for the 603 lwarx/stwcx. implementation: rwitm invalidates the cache, but does not clear the reservation. only writes on the bus can clear a reservation. stwcx. is treated as a write-through bus operation. clearing the reservation on all writes including castouts and snoop pushes (nonglobal) require snooping for all global and nonglobal address transfers for reservation address register monitoring; however, a snoop on a nonglobal address transfer does not change any cache states. 4.5.2 cache set element signals the cache set element signals, cse n , are output signals that indicate which set member of the cache is involved for cache block reads and writes. note that because of the different sizes and structures of the caches, the number of signals required to identify a cache set vary from processor to processor. there are three cache set element signals on the 601 (cse[0 2]), one on the 603 (cse), and two on the 604 (cse[0?]).table 4-2 de?es these signals for a four-way set-associative cache, such as is implemented in the 604. for more information, see section 2.4.12, ?ache set element (csen)?utput. 4.5.3 address retry sources snooping devices use s hd and a r tr y to respond to snoop requests. because these signals are wire-ored among many potential snoopers that can have different snoop responses, little signi?ance can be attached to the particular combination of the two bits that appears on the bus. an assertion of ar tr y , regardless of whether shd is asserted, indicates either that at least one snooper had a pipeline collision or a snoop hit to a modi?d block and that the address must be retried. assertion of shd alone indicates that at least one snooper had a snoop hit on a shared cache block. the 603 does not implement s hd . table 4-2. cse[0?] signals cse[0?] cache set element 00 block 0 01 block 1 10 block 2 11 block 3 4-12 powerpc microprocessor family: the bus interface for 32-bit microprocessors 4.6 memory coherency actions?owerpc 60x processor-initiated operations table 4-3 roughly describes the behavior of the 60x with respect to cacheable load and store operations. all reads originating from the 603 except those where caching is inhibited are rwitms. also, all reads on the bus except rwnitc invalidate a cache block. table 4-4 roughly describes the behavior of the 60x with respect to load and store operations to cacheable, write-back memory. note that the state of the shd signal is not important in this table. for detailed descriptions of these operations, see, section e.1, ?oad operations,?and section e.2, ?tore operations. 4.6.1 cache control instructions table 4-5 lists bus operations performed by the 601 and 604 when they execute cache control instructions. table 4-3. memory coherency actions on load operations cache state bus operation snoop response action i read ?r tr y , ?hd load data and mark e ?r tr y , shd load data and mark s ar tr y retry read operation m, e, s none don? care read from cache table 4-4. memory coherency actions on store operations cache state bus operation snoop response action i rwitm ?r tr y load data, modify it, mark m ar tr y retry the rwitm s kill ?r tr y modify cache, mark m ar tr y retry the kill e none don? care modify cache, mark m m none don? care modify cache chapter 4. memory coherency 4-13 figure 4-6 shows the operations for the 603. table 4-5. powerpc 601 and 604 processor bus operations initiated by cache control instructions instruction current cache state next cache state bus operation comment sync don? care no change sync first clears memory queue icbi don? care i 604: icbi; 601: kill dcbi don? care i kill dcbf e, s, i i flush m i write w/ kill marked as wt dcbst e, s, i no change clean m e write w/kill marked as wt dcbz i m kill a write-back may be required s m kill e, m m none write over modi?d data dcbt , dcbtst i e, s read state change on reload m, e, s no change none table 4-6. powerpc 603 bus operations initiated by cache control instructions instruction current cache state next cache state bus operation comment sync don? care no change none first clears memory queue dcbi , icbi don? care i none dcbf e, i i none m i write w/killl dcbst e, i no change none m e write w/killl dcbz 1 i m rwitm possible castout on cache miss e, m m rwitm write over modi?d data dcbt i e rwitm state change on reload 2 e, m no change none 1 rwitm on dcbz serves both as a substitute for a dcbz broadcast and as a mechanism to zero out the cache block (data from rwitm ignored). 2 603 has a touch load buffer. a cache reload is delayed until a hit in the touch load buffer occurs. 4-14 powerpc microprocessor family: the bus interface for 32-bit microprocessors table 4-5 and table 4-6 give a general sense of the basic behavior of the processor. for example, it does not address noncacheable or write-through cases, nor does it completely describe the exact mechanisms for the operations described.for a complete listing of cache coherency operations, see appendix e, ?oherency action tables.? 4.6.2 tlb invalidate entry instruction processing executing a tlbie instruction causes a processor to invalidate any tlb entry that corresponds to that instruction s effective address. it also causes a tlbie operation to be broadcast onto the bus (except on the 603). 4.6.2.1 tlbie bus operation the tlbie bus operation is an address-only transaction. the address that is transmitted contains at least bits ea[12?9] in their correct bit positions. processors that receive this transaction use the address to index into their tlb(s) and invalidate an entire congruence class. any other device that implements its own tlb must process the tlbie bus operation. to avoid system deadlock conditions, devices that process tlbie bus operations must start the operation only after the bus operation has been completed without an ar tr y response. because participating devices take an unspeci?d amount of time to perform their invalidations, completion of the entire invalidation sequence is not guaranteed until completion of a synchronization operation, as described in section 4.7, ?escriptions of bus transactions and snoop responses.?the 601 uses the sync instruction to synchronize tlbie operations; the 604 uses tlbsync . 4.7 descriptions of bus transactions and snoop responses this is a summary of bus transactions and snoop responses. causes and effects of these operations are given in appendix e, ?oherency action tables. 4.7.1 general comments on 60x snooping when 60x processors are not bus master, they monitor bus traf? and perform cache and memory queue snooping as appropriate. snooping is triggered by the receipt of a quali?d snoop request, as indicated by the simultaneous assertion of the t s and g bl . processors drive two snoop status signals, ar tr y and shd , in response to quali?d snoop requests. these signals provide information about the state of the addressed block with respect to 60x for the current bus operation. these signals are described in more detail earlier in this document. the following additional comments apply: any bus transaction that does not have g bl asserted can be ignored by all bus snoopers. such transactions are ignored by 60x processors (except 603). for more information, refer to chapter 8, ?ystem interface operation,?in the powerpc 603e risc microprocessor user s manual . chapter 4. memory coherency 4-15 several bus transactions (write with flush, read, and read with intent to modify) are de?ed twice, once with tt0 clear and once with it set (for atomic operations). these operations behave in the same manner with respect to bus snooping. the receiving processor may assert ar tr y in response to any bus transaction due to internal con?cts that prevent the appropriate snooping. 4.7.2 clean block clean block is an address-only transaction a 60x processor issues after executing a dcbst instruction. if gbl is asserted, a clean block transaction causes 60x processors to respond as follows: if the addressed block is in the i, s, or e state, no further action is taken. if the addressed block is in the m state, the modi?d block is copied back to memory and the state of the block is changed to e. the 603 does not broadcast or snoop clean block operations. 4.7.3 flush block flush block is an address-only transaction that a processor issues after executing a dcbf instruction. if gbl is asserted, a ?sh block transaction causes 60x processors to respond as follows: if the addressed block is in the s or e state, the state of the addressed block is changed to i. if the addressed block is in the m state, the snooping device asserts ar tr y and shd , the modi?d block is pushed out of the cache, and its state is changed to i. the 603 does not broadcast or snoop ?sh block operations. 4.7.4 write with flush, write with flush atomic write with ?sh and write with ?sh atomic are issued by a processor after executing stores or stwcx. respectively to memory in a variety of different states, particularly noncacheable and write-through. 60x processors do not use this transaction code for burst transfers, but system use for bursts is not precluded. if they appear on the bus and the gbl signal is asserted, the 60x processors have the same snoop response as for ?sh block, except that a hit on the reservation address causes loss of the reservation. 4.7.5 kill block a kill block is an address-only transaction that a processor generates by executing a dcbi instruction (or an icbi instruction in a 601), a dcbz to an i or s line, or a write to an s line. if g bl is asserted when a transaction appears on the bus, an addressed block in the cache is forced to the i state. the 603 does not broadcast or snoop kill operations. 4-16 powerpc microprocessor family: the bus interface for 32-bit microprocessors 4.7.6 write with kill a processor typically issues a write-with-kill operation whenever it performs a cache block write back. 60x processors use this transaction code for burst transfers. if they appear on the bus and the gbl signal is asserted, the 60x processors have the same snoop response as for kill block. 4.7.7 read, read atomic read is used by most single-beat or burst bus operations. if gbl is asserted, 60x processors respond to read operations as follows: if the addressed block is present and in the i state, the 60x takes no action. if the addressed block is present and in the s state, the 60x asserts shd . if the addressed block is present and in the e state, the 60x asserts shd and changes the cache state from e to s. if the addressed block is present in the cache in the m state, the 60x asserts both ar tr y and shd . in addition, it changes the state of that cache block from m to s. read atomic operations appear in response to an lwarx instruction and receive the same snooping treatment as a read operation. 4.7.8 read with intent to modify (rwitm) the rwitm transaction is issued to acquire exclusive use of a memory location, for the purpose of modifying it. one example is a processor that writes to a block that is not currently in its cache. rwitm transactions on the bus, when gbl is asserted, cause 60x processors respond as follows: if the addressed block is not present in the cache, the 60x takes no action. if the addressed block is present in the cache in the s or e state, the 60x changes the state of that cache block to i. if the addressed block is present in the cache in the m state, the 60x asserts both ar tr y and shd , pushes the modi?d block out of the cache, and changes the state of that cache block from m to i. rwitm atomic appears on the bus in response to an stwcx. instruction and receives the same snooping treatment as rwitm. 4.7.9 tlb invalidate tlb invalidate is issued by a processor that executes a tlbie instruction. this operation sends at least certain bits of an effective address (ea) across the bus. receiving processors invalidate the entire congruence class in any tlbs associated with that effective address. the address transmitted with the tlbie instruction contains ea[12?9] in their correct respective bit positions (see figure 4-8). chapter 4. memory coherency 4-17 figure 4-8. effective address bits in bus address when the tlbie appears on the bus, attached 60x processors invalidate the congruence class of the tlb that corresponds to the transmitted bits of the effective address. 4.7.10 sync sync is an address-only transaction that a 60x processor places onto the bus as the result of execution of a sync instruction. if a processor has other snooped cache operations pending when it detects a sync on the bus, it asserts ar tr y . a 601 detecting a sync on the bus also asserts ar tr y for any pending operations based on an invalidated tlb. the 603 does not broadcast or snoop sync. 4.7.11 tlbsync tlbsync is an address-only transaction placed on the bus by execution of a tlbsync instruction or a pending tlbie bus operation. a 604 seeing tlbsync , asserts ar tr y if any pending operations are based on an invalidated tlb. the 603 does not broadcast or snoop tlbsync operations. the 601 does not implement the tlbsync instruction and does not generate this bus operation. 4.7.12 eieio the eieio bus operation is generated by executing an eieio instruction, which acts as a fence in the instruction ?w to enforce ordered execution of accesses to noncacheable memory. the 60x processors internally enforce ordering of such accesses with respect to the eieio , in the sense that noncacheable accesses due to instructions that occur before the eieio in the program order are placed on the bus before any noncacheable accesses that result from instructions that occur after the eieio , with the eieio bus operation separating the two sets of bus operations. if the system implements any mechanism that allows reordering of noncacheable requests, then the appearance of an eieio should cause it to force ordering between accesses that occurred before and those that occur later. the 603 does not broadcast or snoop eieio operations. bus address 01219 31 bits from ea 4-18 powerpc microprocessor family: the bus interface for 32-bit microprocessors 4.7.13 icbi this operation is issued by a processor that executes an instruction cache block invalidate ( icbi ) instruction. all copies of the addressed block in bus-attached instruction caches are invalidated. the 603 does not broadcast or snoop icbi operations. the icbi causes the 601 to broadcast a kill operation to the bus. 4.7.14 read with no intent to cache (rwnitc) read with no intent to cache (rwnitc) operations are issued by a bus-attached device as tt[0?] = 0b01011 (like a read, but with tt4 = 1). the 603 and 604 snoop this and, if they get a cache hit on a block marked m, push the block and mark it e (the ordinary response would be to push and mark it s in 604 and push and invalidate for 603). for a graphics adapter that reads display data from memory, this data may be in the processor s cache and the subject of frequent updates. because the adapter does not cache the data, there is no reason for the processor to leave the block in the s state, requiring a bus operation to regain e access. because the 603 has no s state, it must also reread the data. 4.7.15 xferdata xferdata read and write bus transactions result from execution of the eciwx or ecowx instructions, respectively. these instructions help certain adapter types (especially displays) make high-speed data transfers with memory by calculating an effective address, translating it, and presenting the resulting physical address to the adapter. the xferdata read and write transfer a word of data to or from the processor, respectively. they also present the 4-bit resource id (rid) ?ld, which is stored in the processor s transfer control register (tcr) to the bus, using the concatenation of the bits tbst || tsiz[0?]. these transactions are unique in the sense that the address that is transferred does not select the slave device; it is simply being passed to the slave device for use in a subsequent transaction. rather, the rid ?ld is used to select the appropriate slave device. although it is intended that the slave device selected by the rid bits use the address transferred in a subsequent data transfer, the exact nature of this data transfer is not de?ed by 60x bus architecture. it is a private transfer that can be de?ed by the system like any other direct-memory access. chapter 4. memory coherency 4-19 4.8 external wim bit settings the write-through (wt ), cache-inhibit (ci ), and global (gbl ) signals generally correspond to the w, i, and m bits supplied by the translation mechanism (page table or bat); however, there are exceptions: in real mode, load and store operations bypass the translation mechanism and are implicitly wim = 001 or wim = 011 if the cache is disabled or locked. write-back and snoop push operations do not involve the translation mechanism and are sent out as wim = 000. tlb reloads are placed onto the bus as wim = 001 or wim = 011 if the cache is disabled or locked. the dcbst and dcbf instructions to non?rite-through memory are placed on the bus indicating write-through (write-with-kill bus operation) if the cache block is in the m state. the xferdata bus operations are always placed on the bus with wim = 010, regardless of the state of the wim bits supplied by the translation mechanism. for sync, tlbsync, tlbie, and eieio, wim = xx1, where x is not de?ed. for icbi, wim is as provided by the translation mechanism. 4.9 direct-memory access and memory coherency when system devices perform direct-memory accesses, they may choose to assert or negate gbl . 60x processors never snoop a request for which this bit is negated. it is therefore a system design decision whether or not a device that accesses memory should work from memory that is guaranteed to be coherent or not. the trade-off is that snooped accesses, while convenient, generally reduce system performance. another option available to system designers is to de?e different burst length transfers, using the reserved code points de?ed in the tsiz[0?] ?ld. 60x processors snoop regardless of the state of the tsiz signals, provided gbl is asserted. note that coherency cannot be maintained if the system de?es a transfer that crosses a cache block boundary. 4.10 overview of implementation differences table 4-7 summarizes the basic differences in how the various powerpc processors implement the bus operations de?ed in this chapter. this is a brief overview of those differences and do not describe the more subtle differences in the logic that is used to ensure cache coherency which are described in appendix e, ?oherency action tables. 4-20 powerpc microprocessor family: the bus interface for 32-bit microprocessors table 4-7. differences in implementation of bus operations bus operation differences clean block the 603 does not broadcast or snoop clean block operations. flush block the 603 does not broadcast or snoop ?sh block or implement s hd . write with ?sh, write with ?sh atomic in the 601, hid0[31] controls whether hp_snp_req speci?s high-priority operations. kill block the 603 does not broadcast or snoop kill. write with kill read, read atomic the 603 does not implement s hd and s state. read with intent to modify (rwitm) tlb invalidate a snooping 604 also asserts ar tr y when it has a pending tlb invalidate operation and a second tlb invalidate operation is detected. the 601 uses the sync instruction to synchronize tlbie operations; the 603 and 604 use tlbsync . sync the powerpc architecture permits data accesses from more than one instruction to be combined for cache-inhibited operations, except when the accesses are separated by a sync instruction, or by an eieio instruction when the page or block is also designated as guarded. this combined access capability is not implemented on the 603e. the 603 does not broadcast or snoop sync. a sync operation is also generated by the eieio instruction on the 601. tlbsync a 604 seeing tlbsync , asserts ar tr y if any pending operations are based on an invalidated tlb. the 603 does not broadcast or snoop tlbsync . the 601 does not implement the tlbsync instruction and does not generate this bus operation. eieio the 603 does not broadcast or snoop eieio. the eieio is treated as a no-op by the 603e. icbi the 603 does not broadcast or snoop icbi. the icbi causes the 601 to broadcast a kill operation to the bus. read with no intent to cache (rwnitc) read with no intent to cache (rwnitc) operations are issued by a bus-attached device as tt[0?] = 0b01011. the 603 and 604 snoop this and, if they get a cache hit on a block marked m, push the block and mark it e (the ordinary response would be to push and mark it s in 604 and push and invalidate for 603). for a graphics adapter that reads display data from memory, this data may be in the processor s cache and the subject of frequent updates. because it has no s state, the 603 handles some operations differently. xferdata i/o reply the i/o reply operation serves as the ?al bus operation in the series of bus operations that service direct-store interface operation. the 603e processors do not perform these operations because they do not implement the direct-store facility. chapter 5. system status signals 5-1 chapter 5 system status signals 50 50 this chapter further describes the operation of the system status signals (interrupt, checkstop, and reset signals) which are described in section 2.9, ?ystem status signals. most of these are input signals that are used to generate asynchronous exceptions either as a function of normal system operations or as the result of an error. this chapter also brie? discusses asynchronous exceptions described in the programming environments manual , with particular attention given to differences in how 60x processors implement those exceptions. 5.1 overview the powerpc 601, 603, and 604 processors implement asynchronous exceptions that are triggered by signals. asynchronous exceptions can be either maskable or nonmaskable. table 5-1 lists the signal-triggered operations implemented in the 60x processors. (the only other architecture-de?ed asynchronous exception, the decrementer exception, is triggered internally.) the table shows the event priority and indicates whether a related exception is maskable and precise. table 5-1. resets, interrupts, and their sources resets/ interrupts maskable/ nonmaskable precise/ imprecise priority source 601 603 604 hard reset nonmaskable imprecise highest priority hreset hreset hreset machine check nonmaskable imprecise second-highest priority tea tea , mcp , ape , dpe tea , mcp , ape , dpe soft reset nonmaskable imprecise third-highest priority sreset sreset sreset system management * maskable precise lower priority than synchronous exceptions; higher than external interrupt. smi smi external interrupt maskable precise lower priority than a system management exception; higher than decrementer exception. int int int * the system management exception is not de?ed by the powerpc architecture, but is implemented similarly in several powerpc processors. 5-2 powerpc microprocessor family: the bus interface for 32-bit microprocessors table 5-2 describes general differences in how the 601, 603, and 604 implement the system status signals. 5.2 resets there are two types of resets: hard resets?hese occur with the proper assertion of the hreset signal, as part of a system s power-on reset, or due to other system-dependent occurrences. after a hard reset occurs, registers and other resources are initialized and instruction fetching begins at the system reset exception vector at 0xfff00100. soft resets?hese occur with the proper assertion of the sreset signal. when the sreset is detected, the machine state is saved in the srr0 and srr1 registers, the msr is reset, and exception processing continues from the system reset exception handler which resides at vector offset 0x00100. table 5-2. processor bus signal differences signal(s) related exception difference interrupt (int ) external interrupt (0x00500) for the 601, this signal may be negated after the minimum pulse width of three processor clock cycles. system management interrupt (smi ) system management interrupt (0x01400) the system management interrupt exception is not de?ed by the powerpc architecture and not implemented on the 601. machine check (mcp ) machine check (0x00200) this signal is not de?ed for the 601. checkstop input (ckstp_in ) machine check (0x00200) early versions of the 603 identi?d this signal as ckstp . checkstop output (ckstp_out ) machine check (0x00200) early versions of the 603 identi?d this signal as checkst op . hard reset (hreset ) system reset (0x00100) after assertion, output drivers are released to high impedance within ?e sysclk pulses (three for the 601) after the assertion of hreset . soft reset (sreset ) system reset (0x00100) negation may occur any time after the minimum soft reset pulse width of 2 (10 for the 601) bus cycles has been met. chapter 5. system status signals 5-3 5.2.1 hard reset and power-on reset the 60x processors are reset by asserting the hreset input for a minimum period of time after v dd is stable. on the 603 and 604, this includes time for the phase-locked loop to lock. refer to the respective hardware speci?ations for the duration requirement. while hreset is asserted, all 60x outputs are placed in the high-impedance state and bus content has no relevance. this state may continue after the hreset signal has been negated as the processor performs various internal initializations and tests. the system must not perform any activity based on interpretation of ?ating lines. most control lines require pull-up resistors to be negated because they are multidrop. the br signal must also be pulled up so it is not recognized as asserted during processor initialization. the following is also true when a hard reset occurs: external checkstops are enabled. the on-chip test interface has given control of the i/os to the rest of the chip for functional use. since the reset exception has data and instruction translation disabled (msr[dr] and msr[ir] both cleared), the chip operates in direct address translation mode (referred to as the real addressing mode in the architecture speci?ation). because msr[ip] is set by a hard reset, the ?st instruction is fetched from address 0xfff0_0100. 5.2.1.1 hard reset settings note that a hard reset operation should be performed on power-on to appropriately reset the processor. table 5-3 shows the state of the machine just before it fetches the ?st instruction after a hard reset. table 5-3. hard reset settings resource 601 1 603 603e 604/604e 2 bats all 0s unknown unknown unde?ed cache all 0s all cache blocks invalidated all cache blocks invalidated unde?ed and disabled cr all 0s all 0s all 0s unde?ed ctr all 0s all 0s all 0s unde?ed dabr breakpoint disabled; address unde?ed. dar all 0s all 0s all 0s unde?ed dcmp/icmp all 0s all 0s dec all 0s ffff_ffff ffff_ffff unde?ed dmiss/imiss all 0s all 0s 5-4 powerpc microprocessor family: the bus interface for 32-bit microprocessors dsisr all 0s all 0s all 0s unde?ed ear all 0s all 0s e cleared; rid unde?ed. fprs all 0s all 0s unknown unde?ed fpscr all 0s all 0s all 0s cleared gprs all 0s all 0s unknown unde?ed hash1 all 0s hash2 all 0s hid0 8001_0080 all 0s all 0s all 0s hid1 all 0s all 0s hid2 see iabr hid5 all 0s hid15 all 0s iabr all 0s all 0s all 0s breakpoint is disabled. address is unde?ed. lr all 0s all 0s all 0s unde?ed mq all 0s msr 0000_1040 0000_0040 0000_0040 0000_0040 (only ip set) pir unde?ed pvr version-dependent rpa all 0s all 0s rtcl all 0s rtcu all 0s sdr1 all 0s all 0s all 0s unde?ed sprgs all 0s all 0s all 0s unde?ed srr0 all 0s all 0s all 0s unde?ed srr1 all 0s all 0s all 0s unde?ed srs all 0s unknown unknown unde?ed tag directory all 0s. (however, lru bits are initialized so each side of the cache has a unique lru value.) tbl all 0s all 0s unde?ed table 5-3. hard reset settings (continued) resource 601 1 603 603e 604/604e 2 chapter 5. system status signals 5-5 the 604e s bus interface can be con?ured into one of two modes during a hard reset, as described in table 5-4. 5.2.2 soft reset a soft reset is generated by the proper assertion of the sreset signal. when the signal is recognized as asserted, the system reset exception is generated as described in the following section. 5.2.2.1 system reset exception (0x00100) the system reset exception is de?ed by the powerpc architecture (operating environment architecture, or oea) as a nonmaskable, asynchronous exception signaled to the processor typically through the assertion of a system-de?ed signal. tbu all 0s all 0s unde?ed tlbs all 0s unknown unknown unde?ed xer all 0s all 0s all 0s unde?ed 1 601 notes: master checkstop enabled; internal power-on reset checkstops enabled. note that if external clock is connected to rtc for the 601, the rtcl, rtcu, and dec registers can change from their initial value of 0s without receiving instructions to load those registers. all internal arrays and registers are cleared during the hard reset process. 2 604/604e notes: both hreset and trst signals should be asserted during power up and must remain asserted according to the values provided in the powerpc 604 risc microprocessor hardware speci?ations . the 604 internal state after the hard reset interval is de?ed below. if hreset is asserted for less than this amount of time, results are not predictable. if hreset is asserted during normal operation, all operations stop and the machine state is lost. the processor automatically begins operations by issuing an instruction fetch. because caching is inhibited at start-up, this generates a single-beat load operation on the bus. the following output signals are placed in high impedance during hard reset: abb , ts , xa ts , a[0?1], ap[0?], tt[0?], tsiz[0?], tbst , tc n , ci , wt , gbl , cse n , ar tr y , shd , dbb , dh[0?1], dl[0?1], and dp[0?]. the following output signals are negated during hard reset: br , ape , dpe , rsr v , and chkstp_out . table 5-4. powerpc 604e processor modes configurable during hreset 604e mode input signal timing requirements notes normal bus mode dr tr y must be negated throughout hreset assertion. after hreset negation, dr tr y can be used normally. fast-l2 mode dr tr y must be asserted and negated coincidentally with hreset and remain negated during normal operation. can be done by tying dr tr y to hreset no-dr tr y mode (604 only) dr tr y must be asserted coincidentally with hreset and remain asserted during normal operation. can be done by tying dr tr y asserted. table 5-3. hard reset settings (continued) resource 601 1 603 603e 604/604e 2 5-6 powerpc microprocessor family: the bus interface for 32-bit microprocessors table 5-5 shows how the machine state is saved and the msr settings after the system reset exception is invoked. when a system reset exception is taken, instruction execution continues at offset 0x00100 from the physical base address indicated by msr[ip]. if the exception is recoverable, the value of the msr[ri] bit is copied to the corresponding srr1 bit. the exception functions as a context synchronizing operation. the exception is not recoverable if a reset exception causes the loss of any of the following: an asynchronous precise exception (interrupt, system management, or decrementer) direct-store error type dsi floating-point enabled type program exception if the srr1 bit corresponding to msr[ri] is cleared, the exception is context synchronizing only with respect to subsequent instructions. note that each implementation provides a means for software to distinguish between power-on reset and other types of system resets (such as soft reset). 5.2.2.2 soft reset on the powerpc 601 microprocessor because the 601 does not implement the msr[ri] bit, it does not support restarting the interrupted process; however, to perform diagnostic operations it attempts to save the processor state. table 5-5. system reset exception?egister settings register setting description srr0 set to the effective address of the instruction that the processor would have attempted to execute next if no exception conditions were present. srr1 0 loaded with equivalent bits from the msr (cleared in the 601) 1? cleared 5? loaded with equivalent bits from the msr (cleared in the 601) 10?5 cleared 16?9 loaded with equivalent bits from the msr 30 loaded from the equivalent msr bit, msr[ri] 1 , if the exception is recoverable; otherwise cleared. 31 loaded with equivalent bit from the msr note that depending on the implementation, reserved bits in the msr may not be copied to srr1. if the processor state is corrupted to the extent that execution cannot resume reliably, the bit corresponding to msr[ri] 1 , (srr1[30]), is cleared. msr pow 1 0 tgpr 2 0 ile 1 ee 0 pr 0 fp 0 me fe0 0 se 0 be 0 fe1 0 ip 3 ir 4 0 dr 5 0 ri 1 0 le 6 set to value of ile 1 not implemented on the 601 2 603e only 3 identi?d as ep on the 601 4 identi?d as it on the 601 5 identi?d as dt on the 601 6 not implemented on the 601. control of little-endian mode on the 601 is provided by hid0[28], the lm bit. chapter 5. system status signals 5-7 5.2.2.3 soft reset on the powerpc 603 microprocessor when sreset is asserted, the processor attempts to reach a recoverable state by allowing the next instruction to either complete or cause an exception, blocking the completion of subsequent instructions and allowing the completed store queue to drain. a soft reset is recoverable provided that attaining the recoverable state does not cause a machine check exception. 5.2.2.4 soft reset on the powerpc 604 microprocessor unlike hard reset, soft reset does not directly affect the states of output signals. attempts to use system reset during a hard reset sequence or while the jtag logic is nonidle causes unpredictable results. processing interrupted by a system reset can be restarted. 5.3 machine check and checkstops the powerpc architecture de?es a machine check exception which is used for diagnostics. generally when a condition that generates a machine check is present, whether the exception is taken is determined by the value of the machine check enable bit, msr[me]. if it is cleared, the machine check exception is disabled and the processor instead enters checkstop state, which is described in the following section. for a detailed discussion of the machine check exception, see section 5.3.2, ?achine check exception (0x00200). 5.3.1 checkstop state (msr[me] = 0) when a processor is in the checkstop state, instruction processing is suspended and generally cannot be restarted without resetting the processor. the contents of all latches (except any associated with the bus clock) are frozen within two cycles upon entering checkstop state so that the state of the processor can be analyzed as an aid in problem determination. a machine check exception may result from referencing a nonexistent physical address, either directly (with msr[dr] = 0), or through an invalid translation. on such a system, for example, execution of a data cache block set to zero ( dcbz ) instruction that introduces a block into the cache associated with a nonexistent physical address may delay the machine check exception until an attempt is made to store that block to main memory. note that not all powerpc processors provide the same level of error checking. the reasons a processor can enter the checkstop state are implementation-dependent. 5-8 powerpc microprocessor family: the bus interface for 32-bit microprocessors 5.3.2 machine check exception (0x00200) if no higher-priority exception is pending (namely, a hard reset), the processor initiates a machine check exception when the appropriate condition is detected. note that the causes of machine check exceptions are implementation- and system-dependent, and are typically signalled to the processor by the assertion of a speci?d signal on the processor interface. when a machine check condition occurs and msr[me] = 1, the exception is recognized and handled. if msr[me] = 0 and a machine check occurs, the processor generates an internal checkstop condition. when a processor is in checkstop state, instruction processing is suspended and generally cannot continue without resetting the processor. some implementations may preserve some or all of the internal state of the processor when entering the checkstop state, so that the state can be analyzed as an aid in problem determination. in general, it is expected that a bus error signal would be used by a memory controller to indicate a memory parity error or an uncorrectable memory ecc error. note that the resulting machine check exception has priority over any exceptions caused by the instruction that generated the bus operation. if a machine check exception causes an exception that is not context synchronizing, the exception is not recoverable. also, if a machine check exception causes the loss of one of the following exceptions, the exception is not recoverable: an external exception (interrupt or decrementer) direct-store error type dsi exception floating-point enabled type program exception, if the srr1 bit corresponding to msr[ri] is cleared, the exception is context synchronizing only with respect to subsequent instructions. if the exception is recoverable, the srr1 bit corresponding to msr[ri] is set and the exception is context synchronizing. on some implementations, a machine check exception may be caused by referring to a nonexistent physical (real) address, either because translation is disabled (msr[ir] or msr[dr] = 0) or through an invalid translation. on such a system, execution of the dcbz instruction can cause a delayed machine check exception by introducing a block into the data cache that is associated with an invalid physical (real) address. a machine check exception could eventually occur when and if a subsequent attempt is made to store that block to memory. when a machine check exception is taken, registers are updated as shown in table 5-6. chapter 5. system status signals 5-9 if msr[ri] is set, the machine check exception may still be unrecoverable in the sense that execution cannot resume in the same context that existed before the exception. when a machine check exception is taken, instruction execution resumes at offset 0x00200. 5.3.2.1 machine check exception (0x00200) powerpc 601 processor the 601 conditionally initiates a machine check exception after detecting the assertion of the tea signal, which indicates that a bus error occurred and the system terminates the current transaction. one clock cycle after tea is asserted, the data bus signals go to the high-impedance state; however, data entering the gpr or the cache is not invalidated. if the msr[me] bit is set, the exception is recognized and handled; otherwise, the 601 attempts to enter an internal checkstop condition. this may not lead to a checkstop depending upon the state of the various checkstop enable control bits in the hid0 register. these are described in section 5.3.2.2.1, ?heckstop sources and enables register?id0.? if msr[me], hid0[ce], and hid0[em] bits are cleared (that is, when both the master checkstop and the machine check checkstops are disabled), the machine check exception is taken. in general, it is expected that the tea signal would be used by a memory controller to indicate a memory parity error or an uncorrectable memory ecc error. note that the resulting machine check exception is imprecise and has priority over any exceptions caused by the instruction that generated the bus operation. table 5-6. machine check exception?egister settings register setting description srr0 on a best-effort basis, implementations can set this to an ea of some instruction that was executing or about to be executing when the machine check condition occurred. srr1 bit 30 is loaded from msr[ri] 1 if the processor is in a recoverable state. otherwise cleared. the setting of all other srr1 bits is implementation-dependent. msr pow 1 0 tgpr 2 0 ile 1 ee 0 pr 0 fp 0 me 3 fe0 0 se 0 be 0 fe1 0 ip 4 ir 5 0 dr 6 0 ri 1 0 le 7 set to value of ile 1 not implemented on the 601 3 603 only 3 note that when a machine check exception is taken, the exception handler should set msr[me] as soon as it is practical to handle another machine check exception. otherwise, subsequent machine check exceptions cause the processor to automatically enter the checkstop state. 4 identi?d as ep on the 601 5 identi?d as it on the 601 6 identi?d as dt on the 601 7 not implemented on the 601. control of little-endian mode on the 601 is provided by hid0[28], the lm bit. 5-10 powerpc microprocessor family: the bus interface for 32-bit microprocessors 5.3.2.2 checkstop state (msr[me] = 0)?owerpc 601 processor when a processor is in checkstop state, instruction processing is suspended and generally cannot be restarted without resetting the processor. the contents of all latches are frozen within two cycles upon entering checkstop state so that the state of the processor can be analyzed as an aid in problem determination. a machine check exception may result from referring to a nonexistent physical address. in some implementations, for example, execution of a data cache block set to zero ( dcbz ) instruction that introduces a block into the cache associated with a nonexistent physical address may delay the machine check exception until an attempt is made to store that block to main memory. checkstop sources and enables for the 601 are described in the following section. 5.3.2.2.1 checkstop sources and enables register?id0 the checkstop sources and enables register (hid0), shown in figure 5-1, is a supervisor-level register that de?es enable and monitor bits for each of the checkstop sources in the 601. the spr number for hid0 is 1008. figure 5-1. hid0?heckstop sources and enables register (601) table 5-7 de?es the bits in hid0. the enable bits (bits 15?1) can be used to mask individual checkstop sources, although these are provided primarily to mask off any false reports of such conditions for debugging purposes. bit 0 (hid0[ce]) is a master checkstop enable; if it is cleared, all checkstop conditions are disabled; if it is set, individual conditions can be enabled separately. hid0[em] (bit 16) enables and disables machine check checkstops; clearing this bit masks machine check checkstop conditions that occur when msr[me] is cleared. bits 1?1 are the checkstop source bits, and can be used to determine the speci? cause of a checkstop condition. all enable bits except 15 and 24 are disabled at start up. the operating system should enable these checkstop conditions before the power-on reset sequence is complete. reserved edt esh ecd etd ebd ecp eiu epp eba drf drl par emc ehp ce s m td cd sh dt ba bd cp iu pp 0 0 0 es em lm 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 chapter 5. system status signals 5-11 table 5-7. hid0?heckstop sources and enables register (601) bit name description 0 ce master checkstop enable. enabled if set. if this bit is cleared and the tea signal is asserted, a machine check exception is taken, regardless of the setting of msr[me]. 1 s microcode checkstop detected if set. 2 m double machine check detected if set. 3 td multiple tlb hit checkstop if set. 4 cd multiple cache hit checkstop if set. 5 sh sequencer time out checkstop if set. 6 dt dispatch time out checkstop if set. 7 ba bus address parity error if set. 8 bd bus data parity error if set. 9 cp cache parity error if set. 10 iu invalid microcode instruction if set. 11 pp direct-store interface access protocol error if set. 12?4 reserved 15 es enable microcode checkstop. enabled by hard reset. enabled if set. 16 em enable machine check checkstop. disabled by hard reset. enabled if set. if this bit is cleared and the tea signal is asserted, a machine check exception is taken, regardless of the setting of msr[me]. 17 etd enable tlb checkstop. disabled by hard reset. enabled if set. 18 ecd enable cache checkstop. disabled by hard reset. enabled if set. 19 esh enable sequencer time out checkstop. disabled by hard reset. enabled if set. 20 edt enable dispatch time out checkstop. disabled by hard reset. enabled if set. 21 eba enable bus address parity checkstop. disabled by hard reset. enabled if set. 22 ebd enable bus data parity checkstop. disabled by hard reset. enabled if set. 23 ecp enable cache parity checkstop. disabled by hard reset. enabled if set. 24 eiu enable for invalid ucode instruction checkstop. enabled by hard reset. enabled if set. 25 epp enable for direct-store protocol checkstop. disabled by hard reset. enabled if set. 26 drf 0 optional reload of alternate sector on instruction fetch miss is enabled. 1 optional reload of alternate sector on instruction fetch miss is disabled. 27 drl 0 optional reload of alternate sector on load/store miss is enabled. 1 optional reload of alternate sector on load/store miss is disabled. 28 lm 0 big-endian mode is enabled. 1 little-endian mode is enabled. 29 par 0 precharge of the ar tr y and shd signals is enabled. 1 precharge of the ar tr y and shd signals is disabled. 5-12 powerpc microprocessor family: the bus interface for 32-bit microprocessors checkstop enable bits can be set or cleared without restriction. if a checkstop source bit is set, it can be cleared; however, if the corresponding checkstop condition is still present on the next clock, the bit will be set again. a checkstop source bit can only be set when the corresponding checkstop condition occurs and the checkstop enable bit is set; it cannot be set via an mtspr instruction. that is, you cannot manually cause a checkstop. the hid0 register is set to 0x80010080 by the hard reset operation. however, the state of the emc bit depends on the results of the power-on diagnostics for the main cache array. this bit is set if the cache fails the built-in self test during the power-on sequence. 5.3.2.3 machine check exception?owerpc 603 processor the 603 conditionally initiates a machine check exception after detecting the assertion of the tea or mcp signals on the 603 bus (assuming the machine check is enabled, msr[me] = 1). the assertion of one of these signals indicates that a bus error occurred and the system terminates the current transaction. one clock cycle after the signal is asserted, the data bus signals go to the high-impedance state; however, data entering the gpr or the cache is not invalidated. note that if hid0[emcp] is cleared, the processor ignores the assertion of the mcp signal. register settings when the 603 takes a machine check exception are described in table 5-6. note that the 603 makes no attempt to force recoverability; however, it does guarantee the machine check exception is always taken immediately upon request, with a nonpredicted address saved in srr0, regardless of the current machine state. any pending stores in the completed store queue are cancelled when the exception is taken. software can use the machine check exception in a recoverable mode for checking bus con?uration. for this case, a sync , load, sync instruction sequence is used. a subsequent machine check exception at the load address indicates a bus con?uration problem and the processor is in a recoverable state. if msr[me] is set, the exception is recognized and handled; otherwise, the 603e attempts to enter an internal checkstop. note that the resulting machine check exception has priority over any exceptions caused by the instruction that generated the bus operation. 30 emc 0 no error detected in main cache during array initialization. 1 error detected in main cache during array initialization. 31 ehp 0 the hp_snp_req signal is disabled. use of the associated queue position is restricted to a snoop hit that occurs when a read is pending. that is, its address tenure is complete but the data tenure has not begun. 1 the hp_snp_req signal is enabled. use of the associated queue position is restricted to a snoop hit on an address tenure that had hp_snp_req asserted. table 5-7. hid0?heckstop sources and enables register (601) (continued) bit name description chapter 5. system status signals 5-13 5.3.2.4 checkstop state (msr[me] = 0)?owerpc 603 processor when the 603 enters checkstop state, it asserts the checkstop output signal, ckstp_out . the following events will cause the 603e to enter the checkstop state: machine check exception occurs with msr[me] cleared. external checkstop input, ckstp_in , is asserted. a direct-store protocol error occurs. when a processor is in checkstop state, instruction processing is suspended and generally cannot be restarted without resetting the processor. the contents of all latches are frozen within two cycles upon entering the checkstop state so that the state of the processor can be analyzed as an aid in problem determination. note that not all powerpc processors provide the same level of error checking. the reasons a processor can enter checkstop state are implementation-dependent. 5.3.2.5 machine check exception?owerpc 604 processor the 604 implements the machine check exception as de?ed in the powerpc architecture (oea). it conditionally initiates a machine check exception after an address or data parity error occurred on the bus or in a cache, after receiving a quali?d transfer error acknowledge (tea ) indication on the 604 bus, or after the machine check interrupt (mcp ) signal had been asserted. as de?ed in the oea, the exception is not taken if the msr[me] is cleared. machine check conditions can be enabled and disabled using bits in the hid0 register described in table 5-8. a tea indication on the bus can result from any load or store operation initiated by the processor. in general, the tea signal is expected to be used by a memory controller to indicate that a memory parity error or an uncorrectable memory ecc error has occurred. note that the resulting machine check exception is imprecise and unordered with respect to the instruction that originated the bus operation. if the msr[me] bit and the appropriate bits in hid0 are set, the exception is recognized and handled; otherwise, the processor generates an internal checkstop condition. when a processor is in checkstop state, instruction processing is suspended and generally cannot table 5-8. machine check enable bits hid0 bit description 0 enable machine check input pin 1 enable cache parity checking 2 enable machine check on address bus parity error 3 enable machine check on data bus parity error 5-14 powerpc microprocessor family: the bus interface for 32-bit microprocessors continue without restarting the processor. note that many conditions may lead to the checkstop condition; the disabled machine check exception is only one of these. machine check exceptions are enabled when msr[me] = 1; this is described in section 5.3.2.5.1, ?achine check exception enabled (msr[me] = 1).?if msr[me] = 0 and a machine check occurs, the processor enters the checkstop state. checkstop state is described in section 5.3.1, ?heckstop state (msr[me] = 0). 5.3.2.5.1 machine check exception enabled (msr[me] = 1) when a machine check exception is taken, registers are updated as shown in table 5-6. the machine check exception is usually unrecoverable in the sense that execution cannot resume in the same context that existed before the exception. if the condition that caused the machine check does not otherwise prevent continued execution, msr[me] is set to allow the processor to continue execution at the machine check exception vector address. typically earlier processes cannot resume; however, the operating systems can then use the machine check exception handler to try to identify and log the cause of the machine check condition. 5.3.2.5.2 checkstop state (msr[me] = 0) when a processor is in checkstop state, instruction processing is suspended and generally cannot resume without the processor being reset. the contents of all latches are frozen within two cycles upon entering checkstop state. a machine check exception may result from referencing a nonexistent physical address, either directly (with msr[dr] = 0), or through an invalid translation. on such a system, for example, execution of a data cache block set to zero ( dcbz ) instruction that introduces a block into the cache associated with a nonexistent physical address may delay the machine check exception until an attempt is made to store that block to main memory. 5.4 external interrupt exception (0x00500) the powerpc architecture de?es an external interrupt exception, which in the 60x processors is signaled to the processor by the assertion of the external interrupt signal, int . the exception may be delayed by other higher-priority exceptions or if the msr[ee] bit is zero when the exception is detected. note that the occurrence of this exception does not cancel the external request. the register settings for the external interrupt exception are shown in table 5-9. chapter 5. system status signals 5-15 note that the processor recognizes the interrupt condition (int asserted) only if msr[ee] is set. to guarantee that the external interrupt is taken, int must remain asserted until the processor takes the interrupt; otherwise, the processor is not guaranteed to take an external interrupt. after the int is detected asserted, the processor stops dispatching instructions and waits for executing instructions to complete. therefore, exceptions caused by instructions in progress are taken before the external interrupt exception is taken. after all instructions complete, the processor takes the external interrupt exception. the interrupt handler must send a command to the device that asserted int , acknowledging the interrupt and instructing the device to negate int . when an external interrupt exception is taken, instruction execution resumes at offset 0x00500 from the physical base address indicated by msr[ip]. 5.4.1 external interrupt?owerpc 601 processor in early versions of the 601 (processor revision level 0x0000), the external interrupt is a level-sensitive signal and should be held active until reset by the interrupt service routine. phantom interrupts due to phenomena such as crosstalk and bus noise should be avoided. table 5-9. external interrupt?egister settings register setting description srr0 set to the effective address of the instruction that the processor would have attempted to execute next if no interrupt conditions were present. on the 603, note that in the rare case when the next instruction is not in the completion queue, the 603 searches elsewhere to provide the appropriate restart instruction address to srr0. srr1 0 loaded with equivalent bits from the msr (cleared in the 601 and 603) 1? cleared 5? loaded with equivalent bits from the msr (cleared in the 601 and 603) 10?5 cleared 16?1 loaded with equivalent bits from the msr note that depending on the implementation, reserved bits in the msr may not be copied to srr1. msr pow 1 0 tgpr 2 0 ile 1 ee 0 pr 0 fp 0 me fe0 0 se 0 be 0 fe1 0 ip 3 ir 4 0 dr 5 0 ri 1 0 le 6 set to value of ile 1 not implemented on the 601 2 603e only 3 identi?d as ep on the 601 4 identi?d as it on the 601 5 identi?d as dt on the 601 6 not implemented on the 601. control of little-endian mode on the 601 is provided by hid0[28], the lm bit. 5-16 powerpc microprocessor family: the bus interface for 32-bit microprocessors 5.4.2 external interrupt?owerpc 603 processor on the 603, note that in the rare case when the next instruction is not in the completion queue, the 603 searches elsewhere to provide the appropriate restart instruction address to srr0. 5.5 system management interrupt exception (0x01400) the system management interrupt, which is implemented on the 603 and 604, but not on the 601, behaves like an external interrupt except for the signal asserted and the vector taken. a system management interrupt is signaled to the processor by the assertion of the smi signal. the interrupt may not be recognized if a higher-priority exception occurs simultaneously or if the msr[ee] bit is cleared when smi is asserted. note that smi takes priority over int if they are recognized simultaneously. after the assertion of smi is detected (and provided that msr[ee] is set), the processor waits for the next instruction (and any exceptions associated with that instruction) to complete before taking the system management interrupt. note that in the rare case when the next instruction is not in the completion queue, the processor searches elsewhere to provide the appropriate restart instruction address to srr0. the register settings for the system management interrupt exception are the same as those for the external interrupt, as shown in table 5-10. when a system management interrupt is taken, instruction execution for the handler begins at offset 0x01400 from the physical base address indicated by msr[ip]. table 5-10. system management interrupt?egister settings register setting description srr0 set to the effective address of the instruction that the processor would have attempted to execute next if no interrupt conditions were present. srr1 0 loaded with equivalent bits from the msr (cleared in the 601) 1? cleared 5? loaded with equivalent bits from the msr (cleared in the 601) 10?5 cleared 16?1 loaded with equivalent bits from the msr note that depending on the implementation, reserved bits in the msr may not be copied to srr1. msr pow 0 tgpr 1 0 ile ee 0 pr 0 fp 0 me fe0 0 se 0 be 0 fe1 0 ip ir 0 dr 0 ri 0 le set to value of ile 1 603e only chapter 5. system status signals 5-17 the processor recognizes the interrupt condition (smi asserted) only if msr[ee] is set; it ignores the interrupt condition if the msr[ee] bit is cleared. to guarantee that the external interrupt is taken, the smi signal must be held active until the processor takes the exception. if the smi signal is negated before the interrupt is taken, the processor is not guaranteed to take a system management interrupt. the interrupt handler must send a command to the device that asserted smi , acknowledging the interrupt and instructing the device to negate smi . 5-18 powerpc microprocessor family: the bus interface for 32-bit microprocessors chapter 6. additional bus configurations 6-1 chapter 6 additional bus con?urations 60 60 chapters 2 through 5 describe basic 60x bus operations. however some processors support additional bus functionality, including the following: no-data retry mode (referred to as no-dr tr y mode).?his mode allows d r tr y to be disabled in the 603 and 604e, which in turn allows data to be forwarded one bus cycle sooner than if dr tr y is enabled. (no-dr tr y mode is implemented on the 604e, but not on the 604; see data streaming mode below.) data streaming mode?ata streaming is the ability to begin data tenure after a previous data tenure with no dead cycles between. data streaming is implemented on 604s. (note that in 604 documentation, this was called fast-l2/data streaming mode and no-dr tr y /data streaming mode, although there is no relation to the no- dr tr y mode described above.) 32-bit data bus mode?he 603 supports an optional 32-bit data bus mode, in which the processor uses only byte lanes 0? for a data transfer, therefore allowing a maximum of 32 bits of data to be transferred per bus clock. reduced pinout mode?he 603 provides an optional reduced-pinout mode that disables dl[0?1], dp[0?], ap[0?], ape , dpe , and rsr v for reduced power consumption. the 32-bit data bus mode is implicitly selected when reduced-pinout mode is enabled. direct-store mode, which provides an alternative method for i/o bus operations, is described in chapter 7, ?irect-store interface. 6.1 no-dr tr y mode (603 and 604e) the 603 family and 604e processors provides a way to disable the use of the data retry function. no-dr tr y mode allows data to be forwarded during load operations to the internal processor one bus cycle sooner than with normal bus protocol. the 60x bus protocol speci?s that, during load operations, the memory system normally can cancel data that the master read on the bus cycle after t a was asserted. on 603 and 604e processors, this late cancellation requires any data loaded at the bus interface to be held one additional bus clock to verify that the it is valid before forwarding it to the internal cpu. for systems that do not use the dr tr y function, no-dr tr y mode eliminates this one- cycle stall and allows data to pass to the internal cpu immediately when t a is recognized. 6-2 powerpc microprocessor family: the bus interface for 32-bit microprocessors when the processor is in no-dr tr y mode, data can no longer be cancelled the cycle after it is acknowledged by an assertion of t a . data is immediately forwarded to the cpu internally, and any attempt at late cancellation by the system may cause improper operation by the processor. when the 603e uses normal bus protocol, data can be cancelled the bus cycle after t a by either late cancellation by dr tr y or by ar tr y . when no-dr tr y mode is selected, both cancellation cases must be disallowed in the system design for the bus protocol. no-dr tr y mode requires the system to ensure that dr tr y not be asserted to the processor, which may cause improper operation of the bus interface. the system must also ensure that a snooping device does not assert a r tr y later than the ?st assertion of t a to the processor, but not on the cycle after the ?st assertion of t a . apart from the inability to cancel data that was read by the master on the bus cycle after t a was asserted, the 603 bus protocol is identical to that for the basic transfer bus protocols, as well as for 32-bit data bus mode. the processor selects the desired dr tr y mode at start-up by sampling d r tr y at the negation of h reset . if dr tr y is negated, normal operation is selected; if it is asserted, no-dr tr y mode is selected. 6.1.1 no-dr tr y mode in powerpc 604e processor in no-dr tr y mode, the system must de?e the beginning of the window in which the snoop response is valid and ensure that no data is transferred before the same cycle as the beginning of that window. for example, if the system de?es a snoop response window that begins the second cycle after ts , t a can be asserted no sooner than the second cycle after ts . this timing constraint on the earliest allowable assertion of t a with respect to ar tr y is identical to that constraint in data streaming mode. to upgrade a 604-based system to the 604e and use no-dr tr y , the following should be observed: the system uses the 604 in normal bus mode, described earlier in this section. ? r tr y must be tied negated and never used. the system must never assert t a before the ?st cycle of the system s snoop response window. this system would then see a performance improvement due to the shorter effective latency seen by the 604e on read operations. this improvement is equal to one bus cycle (three processor cycles in 3:1 bus mode). chapter 6. additional bus configurations 6-3 6.2 data streaming mode (604) data streaming is the ability to start a data tenure after a previous data tenure with no dead cycles between. (note that in 604 documentation, this was called fast-l2/data streaming mode and no-dr tr y /data streaming mode, although there is no connection to the no- dr tr y mode described above.) the 604 supports data streaming only for consecutive burst-read data transfers. this does include support for data streaming consecutive burst read data transfers between two separate masters. for instance, in a multiple-604 system, data streaming is allowed on consecutive burst read data transfers from different 604s. to cause data streaming, the system asserts dbg during the last data transfer of the ?st data tenure as shown in figure 6-1. to fully realize the performance gain of data streaming, the system should be prepared to, but is not required to, supply an uninterrupted sequence of t a assertions. figure 6-1. data transfer in data streaming mode 6.2.1 data valid window in the data streaming mode standard bus mode operations allow data to be transferred no earlier than the cycle before the ar tr y window that the system de?es. in some cases, an asserted ar tr y invalidates the data that was transferred the previous cycle, in the same way dr tr y cancels data from the previous cycle. in data streaming mode, the data buffering that allows late cancellation of a data transfer does not exist, so late cancellation with ar tr y is also impossible. therefore, the earliest that data can be transferred in data streaming mode is the ?st cycle of the ar tr y window, not the cycle before that. 6.2.2 data valid window in the data streaming mode standard bus mode operations allow data to be transferred no earlier than the cycle before the ar tr y window that the system de?es. in some cases, an asserted ar tr y invalidates the data that was transferred the previous cycle, in the same way dr tr y cancels data from the previous cycle. data ta bus clock 012 345 6 7 8 dbg tr-a1 tr-a2 tr-a3 tr-a4 tr-b1 tr-b2 tr-b3 tr-b4 9 6-4 powerpc microprocessor family: the bus interface for 32-bit microprocessors in data streaming mode, the data buffering that allows late cancellation of a data transfer does not exist, so late cancellation with ar tr y is also impossible. therefore, the earliest that data can be transferred in data streaming mode is the ?st cycle of the ar tr y window, not the cycle before that. 6.2.3 design practices for data streaming mode it is recommended that use of data streaming mode be accompanied by two other system design practices: do not use a bb . if the system is designed so an address tenure is de?ed by ts and aa ck assertion, (which the 604 is designed to support), a bb is unnecessary and should be pulled high at the 604. because a bb has a short restore-high time, a bb should not be used in systems that try to achieve a short cycle time. do not use d bb , which is restored high in the same way as abb and therefore has the same problems in a system with short cycle times. to avoid using dbb , the system arbiter must assert dbg for a single cycle, one cycle before the 604 is supposed to begin its data tenure. the dbb signal should be pulled high. the additional system cost of operating in this manner is that data transfers must be counted and d bg can be asserted only on the last cycle in a data tenure. 6.3 32-bit data bus mode (603) the 603 supports an optional 32-bit data bus mode, which operates like the 64-bit data bus mode but uses only byte lanes 0?, corresponding to dh[0?1] and dp[0?]. byte lanes 4? (dl[0?1] and dp[4?]) are never used in this mode. unused bus signals are ignored during read operations and are driven low for write operations. in 32-bit bus mode, data tenures can be one, two, or eight beats depending on the size of the program transaction and the cache mode for the address. data transactions of one or two data beats are performed for caching-inhibited load/store or write-through store operations. note that two-beat burst transactions do not assert t bst (having the same tbst and tsiz[0?] encodings as the 64-bit data bus mode). single-beat data transactions transfer four bytes or less, and two-beat data transactions are performed for eight-byte operations only. the 603 generates an eight-byte operation only for a double-word?ligned load double or store double operation to or from the ?ating- point registers (fprs). eight-beat burst data transactions load data into or store data from the 603 s internal caches. these transactions transfer 32 bytes in the same way as in 64-bit data bus mode, asserting t bst and signalling a transfer size of 2 (tsiz[0?] = 0b010). chapter 6. additional bus configurations 6-5 the same bus protocols apply for arbitration, transfer, and termination of the address and data tenures for both 32- and 64-bit bus modes. for word or smaller transactions, late ar tr y cancellation of the data tenure applies on the bus clock after the ?st data beat is acknowledged (after the ?st t a ); for double-word or burst operations, this may occur on the bus clock after the second data beat is acknowledged (after the second t a or coincident with respective t a if no-dr tr y mode is selected). an example of an eight-beat data transfer while the 603 is in 32-bit data bus mode is shown in figure 6-2. in this example, t a remains asserted for the entire burst transaction. figure 6-2. 32-bit data bus transfer (eight-beat burst) figure 6-3 shows an example of a two-beat data transfer (with dr tr y asserted during each data tenure). ts abb a[0?1] tbst aa ck ar tr y dbb dh[0?1] t a dr tr y tea 01234567 0123456 7891011 6-6 powerpc microprocessor family: the bus interface for 32-bit microprocessors figure 6-3. 32-bit data bus transfer (two-beat burst with drtry ) the 603 selects the data bus mode at start-up by sampling t lbisync at the negation of hreset . if tlbisync is asserted, the bus runs in 32-bit data mode; otherwise, it runs in 64-bit mode. if the t lbisync input function is not used, it can be connected to hreset to place the processor in 32-bit bus mode. otherwise, it should be connected to a pull-up resistor to select 64-bit mode. for systems using the tlbisync input function, h reset must be logically combined with t lbisync to select a data bus mode. 6.4 reduced-pinout mode (603) the 603 has an optional reduced-pinout mode. this mode idles the switching of numerous signals for reduced power consumption. the dl[0?1], dp[0?], ap[0?], ape , dpe , and rsr v signals are disabled when the reduced-pinout mode is selected. note that the 32-bit data bus mode is implicitly selected when the reduced-pinout mode is enabled. when the 603 is in reduced-pinout mode, the bidirectional and output pins disabled are always driven low during the periods when normally they would have been driven by the 603. the open-drain outputs (ape and dpe ) are always three-stated. bidirectional inputs are always turned off at the input receivers of the 603 and are not sampled. the 603 selects either full-pinout or reduced-pinout mode at start-up by sampling the state of q a ck at the negation of hreset . if q a ck is low at the negation of hreset , full- pinout mode is selected by the 603. if qa ck is high at the negation of hreset , reduced- pinout mode is selected. ts abb a[0?1] tbst aa ck ar tr y dbb dh[0?1] t a dr tr y tea 01 012345 6 7 chapter 7. direct-store interface 7-1 chapter 7 direct-store interface 70 70 accesses to direct-store segments, as de?ed in the powerpc architecture, are executed on the bus using the extended transfer protocol (etp), an extension to the basic transfer protocol described in previous chapters. except for one signal, xa ts , this protocol uses the same signal set as the basic transfer protocol, although some signals are rede?ed. the powerpc 601 processor documentation refers to the direct-store interface as the i/o controller interface. direct-store operations are no longer required by the powerpc architecture. some processors, such as the powerpc 603e processor, do not support this feature. powerpc architecture de?es the following characteristics for direct-store accesses: the extended address delivered to the i/o system includes a bus unit id (buid) to address one of several bus devices and a 32-bit address to be delivered to each. a transaction error can be detected and associated with the original instruction. to satisfy the requirements of powerpc architecture for direct-store segments, the following extensions are implemented: a new set of bus operations is provided that rede?es how the transfer type (tt n ), transfer burst (tbst ), and transfer size (tsiz n ) signals are used. these signals together generate the extended address transfer code (xatc), as shown in table 7-4. each direct-store address transfer takes two beats. the ?st transmits the buid and several control bits from the segment register, and the second transfers a complete 32-bit address to the slave device. explicit sender/receiver tags are provided. a split-response protocol is enforced; that is, the sender must wait for a reply from the receiver before considering a transaction complete. the 60x does not burst direct-store transactions, but a type of streaming is permitted. streaming (in this context) allows multiple single-beat transactions to occur before a reply from the direct-store receiver is required. 7-2 powerpc microprocessor family: the bus interface for 32-bit microprocessors direct-store transactions are like memory-mapped accesses, as shown in figure 7-1. they use most of the same signals. they use separate arbitration for the split address and data buses, and also de?e address-only and single-beat transactions. the address retry vehicle is identical, although there is no hardware coherency support. a given 60x processor processes one direct-store transaction at a time, but may perform other bus transactions for the duration of the transfer. a direct-store cycle does not inhibit other bus traf? within its envelope. in addition to the extensions noted above, there are fundamental differences between the basic transfer protocol and the extensions. for example, use of dr tr y is unde?ed. also, only four bytes of the eight-byte data path are available (transmitted on dh[0?1]. this facilitates lower pin-count direct-store interfaces but also offer substantially less bandwidth than memory accesses. additionally, load/store instructions to direct-store addresses cannot retire until an error-free reply is received, which likely further degrades performance, compared to access to normal segments. figure 7-1. direct-store interface protocol tenures the 601 supports an additional mode, memory-forced direct-store mode, that is not de?ed by the powerpc architecture and dependent on the value of buid. this is described in section 7.6, ?emory-forced direct-store interface (powerpc 601 processor only). 7.1 direct-store transaction protocol details as mentioned previously, there are two address-bus beats corresponding to two packets of information about the address. the two packets contain the sender and receiver tags, the address and extended address bits, and extra control and status bits. the two beats of the address bus (plus attributes) are shown at the top of figure 7-2 as two packets. packet 0 is then expanded to re?ct the xatc and address bus information in detail. arbitration transfer termination address tenure data tenure independent address and data arbitration transfer termination i/o response arbitration transfer termination no data tenure for i/o response (i/o responses are address-only) chapter 7. direct-store interface 7-3 7.1.1 packet 0 figure 7-2 shows the organization of the ?st packet in a direct-store transaction. the xatc contains the transfer code. the address bus contains the following: key bit || segment register || sender tag figure 7-2. direct-store operation?acket 0 the contents of the address bus are described in table 7-1. table 7-1. address bits for packet 0 bits description 0? reserved. these bits should be cleared for compatibility with future powerpc microprocessors. 2 key bit?ither sr[kp] or sr[ks]. kp indicates user-level access and ks indicate supervisor-level access. the processor multiplexes the correct key bit into this position according to the operating context. 3?7 address bits 3?7 correspond to bits 3?7 of the selected segment register. a[3?1] form the receiver tag (buid). software must initialize these sr bits to the id of the buc to be addressed. the 601 supports an additional mode, memory-forced direct-store mode, not de?ed by the powerpc architecture, and dependent on the value of buid. see section 7.6, ?emory-forced direct-store interface (powerpc 601 processor only). 28?1 pid (sender tag)?llows a maximum of 16 processor ids to be de?ed for a given system. if more bits are needed for a very large multiprocessor system, the l2 cache (or equivalent logic) can append a larger processor tag. the buc addressed by the receiver tag should latch the sender address required by the subsequent i/o reply operation. the 601 and 604 pid comes from pid [28?1]. the 603 pid is always driven as 0b0000. i/o transfer code 0 1 2 3 1112 27 28 31 07 a[0?1] + attributes address bus (a[0?1]) pkt 0 pkt 1 + xatc reserved key bit from segment register buid pid 7-4 powerpc microprocessor family: the bus interface for 32-bit microprocessors 7.1.2 packet 1 the second address beat, packet 1, transfers byte counts and the physical address for the transaction, as shown in figure 7-3. figure 7-3. direct-store operation?acket 1 for packet 1, the xatc is de?ed as follows: load request operations?atc contains the total number of bytes to be transferred (128 bytes maximum for the 601, 603, and 604). immediate/last (load or store) operations?atc contains the current transfer byte count (1 to 4 bytes). the processor gives the physical address, a[0?1], a concatenation of sr[28?1] with ea[4?1], to the buc, which must keep a valid address pointer for the reply. 7.1.3 i/o reply operations bucs respond to direct-store transactions with an i/o reply operation, shown in figure 7-4, which informs the processor of the success or failure of the operation. this requires a system to have bus mastership capability? substantially more complex design task than bus slave implementations that use memory-mapped i/o access. replies from the buc to the processor are address-only transactions. as with packet 0 of the address bus on direct- store operations, the xatc has the transfer code (see table 7-4). additionally, an i/o reply operation transfers the sender/receiver tags in the ?st beat. figure 7-4. i/o reply operation byte count 07 addr + address bus (a[0?1]) pkt 0 pkt 1 + xatc bus address 034 31 sr[28?1] i/o transfer code 07 address bus (a[0?1]) + xatc reserved error bit segment register buid pid buc-specific 0 1 2 3 1112 27 28 31 chapter 7. direct-store interface 7-5 the address bits are described in table 7-2. the second beat of the address bus is reserved; the xatc and address buses should be driven to zero to preserve compatibility with future protocol enhancements. the following sequence occurs when a processor detects an error bit set on an i/o reply: 1. the processor completes the instruction that initiated the access. 2. if the instruction is a load, the data is forwarded onto the register ?e(s)/sequencer. 3. a direct-store error exception is generated, which transfers processor control to the direct-store error exception handler to recover from the error. if the error bit is not set, the instruction that caused the access completes and instruction execution resumes. system designers should note the following: on the 601 and 603, reply operations that match the processor tag but arrive unexpectedly cause a checkstop condition. the 604 ignores these operations. external logic must assert aa ck input for the processor, even though it is the receiver of the reply operation. the processor monitors address parity when enabled by software and xa ts and reply operations (load or store). table 7-2. address bits for i/o reply operations bits description 0? reserved. these bits should be cleared for compatibility with future powerpc microprocessors. 2 error bit. it is set if the buc records an error in the access. 3?1 buid. sender tag of a reply operation. corresponds with bits 3?1 of one of the segment registers. 12?7 address bits 12?7 are buc-speci? and are ignored by the processor. 28?1 pid (receiver tag). the processor effectively snoops operations on the bus and, on reply operations, compares this ?ld to pid[28?1] (601 and 604) to determine if it should recognize this i/o reply. 7-6 powerpc microprocessor family: the bus interface for 32-bit microprocessors 7.2 direct-store operations table 7-3 lists the type of bus operations supported by the direct-store interface. table 7-3 shows the seven direct-store operations de?ed by the 60x. a single load or store instruction to a direct-store segment generates one or more direct-store operations (two or more direct-store operations for loads) from the 60x and one reply operation from the addressed device. for the ?st address beat, the xatc contains the direct-store transfer code shown in table 7-4. the xatc is formed as follows: xatc = tt0?t3 || tbst || tsiz0?siz2 tt4 is not used. de?itions for these signals are irrelevant to direct-store transfers. table 7-3. direct-store bus operations operation type direction load request address only 60x --> i/o load immediate address/data 60x --> i/o load last address/data 60x --> i/o store immediate address/data 60x --> i/o store last address/data 60x --> i/o load reply address only i/o --> 60x store reply address only i/o --> 60x table 7-4. extended address transfer code definitions operation tt[0?] tbst tsiz[0?] load request 0 1 0 0 0 0 0 0 load immediate 0 1 0 1 0 0 0 0 load last 0 1 1 1 0 0 0 0 store immediate 0 0 0 1 0 0 0 0 store last 0 0 1 1 0 0 0 0 load reply 1 1 0 0 0 0 0 0 store reply 1 0 0 0 0 0 0 0 note : the values in the tbst column are the logical values seen on the signal. chapter 7. direct-store interface 7-7 7.3 store operations store operations are de?ed for the 60x as follows: store immediate and store last operations transfer up to 32 bits of data to the device. a store reply from the slave device indicates the success or failure of that access. a direct-store access consists of one or more data transfer operations followed by a store reply operation from the slave device. if the data can be transferred in one 32-bit data transaction, it is marked as a store last operation followed by the store reply operation; no store immediate operation is involved in the transfer, shown in the following: store last (from 60x).....store reply (from slave device) if more data is involved, there is one or more store immediate operations. the slave device detects the last transfer by looking for the store last transfer code, shown in the following: store immediate(s).....store last....store reply 7.4 load operations direct-store load accesses are like stores, except that the 60x receives instead of transmits data. as with basic transfer protocol, the 60x is master on both load and store operations. the system must grant the data bus to the 60x when the device is ready to provide data. direct-store load requests have no analogous store operation; these address-only operations inform the addressed device of the number of bytes required on the subsequent load immediate/load last operations. the simplest, 32-bit or less, direct-store load is as follows: load request.....load last.....load reply (from slave device) if more data is involved, there is one or more load immediate operations. the device detects the last data transfer by looking for the load last transfer code, shown in the following: load request.....load immediate(s).....load last.....load reply three of the seven de?ed operations are address-only transactions, which like basic transfer protocol, do not use the data bus. unlike the basic transfer protocol, however, these transactions are not broadcast from one master to all snooping devices; rather, they pass control information between the processor and a speci? slave device. 7-8 powerpc microprocessor family: the bus interface for 32-bit microprocessors 7.5 direct-store operation timing the ?ures in this section show timings for typical load and store accesses to direct-store segments. all arbitration signals except for abb and dbb have been omitted for clarity. note that for either case, the number of immediate operations depends on the amount of data to be transferred. if fewer than four bytes of data are transferred and the data does not straddle a double-word address, there is no immediate operation. the 60x can transfer up to 128 bytes of data with a load or store instruction. figure 7-5 shows xa ts asserted with the same timing as ts in basic transfer protocol. however, the address bus (and xatc) change on the next bus cycle. the ?st beat of the two-beat address bus operation is valid for one bus cycle window only, as de?ed by the assertion of xa ts and cannot be extended. address bus beat two can be extended by delaying assertion of aa ck until the system latches the address. figure 7-5. direct-store interface load access example the load request and load reply operations in figure 7-5 are address-only. other types of bus operations can occur between individual direct-store operations on the bus. in this best- case example (no wait states), up to eight bytes of data are transferred in 13 bus cycles. figure 7-6 shows a store operation to a direct-store segment, consisting of a store immediate, a store last, and a store reply. data is transferred on dh[0?1]. unlike the load case, there is no request operation because the 60x has the data ready for the slave device. a b b x a t s a d d r + x a t c d b b dh[0?1] ta 12345678910111213 pkt 0 pkt 1 pkt 0 pkt 1 pkt 0 pkt 1 reply rsrvd request op imm. op last op reply op chapter 7. direct-store interface 7-9 figure 7-6. direct-store interface store access example if tea is asserted during a direct-store access, the resulting action is delayed until all data transfers from the direct-store access complete. the device asserting tea must keep it asserted until the last direct-store data tenure is complete. the direct-store reply, in cases of tea assertion, is not required and is ignored by the processor. the processor does not recognize the assertion of tea until the last direct-store data tenure completes. 7.6 memory-forced direct-store interface (powerpc 601 processor only) the 601 de?es two types of direct-store segments (segment register t bit set) based on the value of the buid, as follows: direct-store interface (buid 1 0x07f)?ormal direct-store accesses include all transactions between the 601 and bucs mapped through direct-store address space. memory-forced direct-store interface (buid = 0x07f)?emory-forced direct- store interface operations access memory space. they do not use the extensions to the memory protocol described for direct-store accesses, and they bypass the page- and block-translation and protection mechanisms. the physical address is found by concatenating bits 28?1 of the respective segment register with bits 4?1 of the effective address. this address is marked noncacheable, write-through, and global. because memory-forced direct-store accesses address memory space, they are subject to the same coherency control as other memory reference operations. more generally, accesses to memory-forced direct-store segments are considered to be cache-inhibited, write-through, and memory-coherent operations with respect to the 601 cache and bus interface. a b b x a t s a d d r + x a t c d b b dh[0?1] ta 12345678910 pkt 0 pkt 1 pkt 0 pkt 1 reply rsrvd imm. op last op reply op 7-10 powerpc microprocessor family: the bus interface for 32-bit microprocessors chapter 8. system considerations 8-1 chapter 8 system considerations 80 80 this chapter describes general considerations for system design with the 60x bus. the following topics are included: arbitration write data reordering ?a ck generation use of sync and tlbsync pull-up resistors features for improved bus performance ieee 1149.1-compliant interface using dbw o lwarx / stwcx. considerations 8.1 arbitration depending on the system implementation, the system arbiter may have various functions. as a minimum, it performs arbitration for access to the address bus and grants access to the data bus. it connects to each bus master with at least three unique signals; two for address bus control, bus request (br ) and bus grant (bg ), and one for data bus granting, d bg . apart from negating bus requests the cycle after a r tr y is asserted, 60x bus protocol offers no inherent fairness in determining bus mastership. therefore system designers must consider system needs as a whole when choosing an arbitration strategy. 8.2 using the data bus write-only mechanism some processors support a limited out-of-order capability for its own pipelined transactions through the data bus write only (dbwo ) signal. when the assertion of dbwo is recognized on the clock of a quali?d data bus grant, the processor is directed to perform the next pending data write tenure (if any) even if a pending read tenure would have normally been performed. the d bwo signal only allows a write tenure to be performed ahead of a pending read tenure from the same processor, not another write tenure. 8-2 powerpc microprocessor family: the bus interface for 32-bit microprocessors in general, an address tenure is followed immediately by its associated data tenure. transactions pipelined by a processor complete in strict order except when the system uses dbwo to allow a processor to perform a snoop push-out operation (or other write transaction pending in the write queues) between the address and data tenures of a read operation. this effectively envelopes the write operation within the read operation. figure 8-1 shows how dbwo supports enveloped write transactions. figure 8-1. data bus write only transaction care should be used when using the enveloped write feature. for systems that do not implement this capability, dbwo should remain negated. in systems where this capability is needed, dbwo should be asserted under the following scenario: 1. the processor initiates a read transaction (either single-beat or burst) by completing the read address tenure with no address retry (ar tr y negated). 2. then, the processor initiates a write transaction by completing the write address tenure with no ar tr y . 3. at this point, if dbwo is asserted with a quali?d data bus grant to the processor, the processor asserts dbb and drives the write data onto the data bus, out of order with respect to the address pipeline. the write transaction ends with the processor negating dbb . 4. the next quali?d data bus grant signals the processor to complete the outstanding read transaction by latching the data on the bus. this assertion of dbg should not be accompanied by an asserted dbwo . write address aa ck dbg abb bg 21 dbb enveloped write dbw o transaction 12 read address write data read data chapter 8. system considerations 8-3 any number of bus transactions by other bus masters can be tried between any of these steps. note the following regarding dbwo : the dbwo signal can be asserted if no read operation is pending; it does not affect write ordering. ordering and presence of data bus writes is determined by the writes in the write queues when b g is asserted for the write address (not dbg ). a snoop push-out operation has highest priority over other queued write operations. because more than one write can be in the write queue when dbg is asserted for the write address, more than one data bus write can be enveloped by a pending read. the arbiter must monitor bus operations and coordinate masters and slaves with respect to the use of the data bus when dbwo is used. individual dbg signals associated with each bus device should allow the arbiter to synchronize both pipelined and split-transaction bus organizations. individual dbg and dbwo signals provide a primitive form of source-level tagging for the granting of the data bus. the ability to perform a snoop push before completion of a read transaction that has been started by the processor prevents certain deadlock conditions. consider a case where a 60x processor shares a bus with a memory controller and a bus converter. assume that the bus converter produces an xyz bus and that the following two requests appear simultaneously: a request from the processor on the 60x bus that requires an xyz bus transaction. a request on the xyz bus that should cause a data transfer with memory on the 60x. the bus converter queues the processor request until the xyz bus transaction completes. the xyz bus transaction causes a request on the 60x bus, and, unfortunately, a snoop hit that requires a push. to avoid deadlock, this enveloped push must complete before the data transfer for the processor request. this is a problem with bus converters using certain styles of buses. in such cases, the system should assert dbw o and d bg together for write operations identi?d as snoop pushes, which may be a dif?ult determination because a system might assert this signal for all write data transfers, effectively reordering all write data ahead of outstanding reads. the arbiter must monitor all bus operations in progress and synchronize masters and slaves with respect to the use of the data bus. each master s dbg allows the arbiter to synchronize pipelining and supports split transaction bus organizations. 8-4 powerpc microprocessor family: the bus interface for 32-bit microprocessors 8.3 aa ck generation systems can use the signals provided by 60x processors to implement a simple, single- envelope bus in which data and address tenures are always together. it can also implement a bus that provides limited pipelining, in which subsequent addresses are sent out before the completion of the current data transfer. it even allows creation of a bus that provides split address and data transfers. the degree to which each processor may support such operations depends on processor design, namely the depth and logic associated with the read and write buffers in the processor s bus interface unit (biu). the system designer must determine how aa ck signals the completion of an address transfer and allows other address transfers to occur. following are some possibilities: the system arbiter may assert aa ck the cycle after it sees an asserted ts . this allows requests to be placed onto the bus at the maximum rate of one every three cycles. system design must ensure that these requests do not exceed the rate at which slave devices can process them. one alternative is for the arbiter to limit the number of outstanding requests, using the bus grant mechanism. another possibility would be to collect busy status from individual bus devices, and use this to pace the arbitration mechanism or to delay the aa ck response. individual devices might generate aa ck , based on their decode of the address. this approach is somewhat limited in performance, however. if the bus clock is slow, it might be permissible to latch the address, decode it, and then drive aa ck on the next cycle. note that it would be necessary to prevent false transitions of aa ck . for a system with a fast clock rate, devices would need to latch the address, take a cycle to decode it, and then issue aa ck on the second cycle following ts , or later. the 60x processors do not provide a graceful way to recover from an operation that receives no aa ck ; however, they perform all address checking that they are to perform before placing addresses on the bus. in general, a processor considers all requests complete when they are placed on the bus, and there is no recoverable error reporting on the bus. 8.4 sync vs. tlbsync and system design the 601 and 604 handle tlbie, sync, and tlbsync bus operations differently. in a 601 system, tlbie operations are followed by a sync operation. in a 604 system, tlbie operations are followed by tlbsync operations, which may affect devices that maintain tlbs. if processors perform the tlbie operation immediately, or if no pending operations are queued, they may require no extra steps to ensure compatibility. however, if they maintain queues of pending operations, and these queues contain translated addresses, they may need to participate in the synchronization operation, and they may need to implement different modes for 601 and 604. chapter 8. system considerations 8-5 8.5 pull-up resistors the following signals are driven by various bus devices: t s , x a ts , a b b , t a , d b b , a r tr y , s h d , and d r tr y . when control of these signals is passed from one device to another, the device that is releasing control always deasserts them before release. the signals are then left in high-impedance state before being driven by another device. pull-up resistors are required on these signals to keep them in the negated state for this interval. note that this can be a fairly high value resistor because it does not cause a transition and only retains a value. 8.6 features for improved bus performance the following 60x processor features help improve bus performance: disabling the dr tr y feature (603 and 604e) decreases read latency by one cycle. the use of abb and dbb are optional on 60x processors. because of the fractional- cycle restoration to the high state, this helps achieve shorter cycle time. consecutive read data transfers can be sent without a dead cycle (604). this increases maximum bandwidth by 25%. 8.7 ieee 1149.1-compliant interface the 604 boundary-scan interface is a fully-compliant implementation of the ieee 1149.1 standard. this section describes the 604 ieee 1149.1(jtag) interface. 8.7.1 ieee 1149.1 interface description table 8-1 describes the 604 s five dedicated jtag signals. the tdi and tdo scan ports are used to scan instructions and data into the various scan registers for jtag operations. the scan operation is controlled by the test access port (tap) controller which in turn is controlled by the tms input sequence. scan data is latched at the rising edge of tck. trst is a jtag-optional signal used to reset the tap controller asynchronously; it ensures that jtag logic does not interfere with normal chip operation. it can be asserted coincident with h reset . table 8-1. ieee interface signal descriptions signal name input/output weak pullup provided ieee 1149.1 function tdi input yes serial scan input signal tdo output no serial scan output signal tms input yes tap controller mode signal tck input yes scan clock trst input yes tap controller reset 8-6 powerpc microprocessor family: the bus interface for 32-bit microprocessors 8.8 lwarx/stwcx. considerations the lwarx and stwcx. instructions are used to synchronize multiple processors. operation of these instructions is described in the following sections. 8.8.1 coherency participation this section describes the 604 mesi coherency mechanism. there are three legal wim encodings that de?e coherency-required regions: x11?oncacheable 001?rite-back 101?rite-through this discussion assumes that any semaphore (the address used for an lwarx/stwcx. operation) addressed by different processors, has the same wim encodings regardless of which processor accesses it. additionally, some of the discussion of write-back cacheable and write-through cacheable are combined as they have similar requirements. 8.8.1.1 noncacheable reservations regardless of whether they are associated with reservations, load and store operations to noncacheable semaphores must access main memory. loads for noncacheable semaphores occur as read atomic bus operations. typically, noncacheable writes (write-with-flush operations) can be buffered at various stages. however, these writes must be broadcast to all processors holding reservations so they can be compared against reservation addresses. note that this is not strictly true. if a memory system implemented a directory of reservations (entry per processor), it would need only direct noncacheable writes to the appropriate processor when a match is detected. it could directly reset the reservation if another input signal existed, although there is not one present on the 604. because it appears to be required for memory coherence for these writes to be broadcast (rather than for reservation reasons), it can be assumed that lwarx / stwcx. will follow this requirement. noncacheable writes would not be required to be broadcast, if no processor could have a cached copy of the data. this is not speci?d by the powerpc architecture. snooping by the processor for write-with-flush (normal and atomic) operations to the reservation address must begin as soon as the lwarx address is acknowledged. this is because a write to that address can occur between the address and data phase of an lwarx instruction. if a snooped write operation matches, the reservation is cleared. snooping for writes by an l2 cache must begin as soon as the lwarx address is acknowledged on its system bus. l2 snoop ?tering for reservations may be simple or complex (see section 8.8.2, ?iltering options for reservations,?for alternatives). in either case, snooping must be able to start as soon as the l2 system bus side sees an acknowledgment, which may constrain when the processor can assert rsr v . chapter 8. system considerations 8-7 the stwcx. instruction cannot be allowed to complete until the operation (write with ?sh atomic) gains access to main memory. this requires any l2 cache to delay acknowledgment of completion of the operation until it is globally performed. buffering cannot be provided unless it is after the completion point with respect to main memory. 8.8.1.2 cacheable reservations if a read to a cacheable semaphore misses, it is fetched with a read atomic bus operation. this places the data in the cache as s or e, depending upon the state of the shd signal. the read may hit in the cache with states m, e, or s for write-back cacheable space, or e or s for write-through cacheable space. it is recommended that the processor notify the external world of the address of the reservation when setting a reservation on an address in the cache. see section 8.8.2, ?iltering options for reservations. 8.8.1.3 read snooping requirements a processor with a reservation on a cacheable semaphore must ensure that any subsequent reads (both read and read atomic) by any other processor do not take the address into their cache in the exclusive state (e). this prevents semaphores that are write-back, cacheable from being modi?d by a write that is invisible to the processor holding the reservation (that is, going from the e state to m state within the other cache). for the mesi protocol used by the 604, this involves asserting shd whenever another processor executes a read to the reservation address. this assertion of shd for reservation purposes is independent of whether the data associated with the address is in the cache. this requirement also extends to an l2 cache. if a reservation is held, it must selectively or freely assert sys_shd for read operations that may occur to the semaphore address. see section 8.8.2, ?iltering options for reservations. 8.8.1.4 write-back reservation-canceling snoops in addition to snooping to ensure that reads do not take exclusive ownership of a reservation address, the processor must also snoop for operations that would cancel the reservation. this snooping is in addition to that required to maintain cache coherency. the following operations cancel a reservation held on a semaphore that is write-back cacheable as they involve transfer of ownership of the address to another processor: rwitm?nother processor gains ownership before completing a store rwitm atomic?nother processor gains ownership before completing an stwcx. kill block?nother processor stores into a shared block or a dcbz instruction is executed a write-with-kill operation cannot occur since it would imply that another processor has gained ownership, in which case a reservation would have been lost. 8-8 powerpc microprocessor family: the bus interface for 32-bit microprocessors 8.8.1.5 write-through reservation-canceling snoops the following operations cancel a reservation held on a semaphore that is write-through cacheable as they involve transfer of ownership of the address to some other processor: rwitm?nother processor gains ownership before completing a store. rwitm atomic?nother processor gains ownership before completing an stwcx. . write and ?sh?nother processor stores into a shared block. because an address can be treated as both write-through and write-back by different processors, both of the previous sets of operations should be snooped for clearing reservations. 8.8.1.6 noncanceling bus operations because the following bus operations do not transfer ownership, they do not cancel the reservation to another processor regardless of the effect they may have on the state of the data in the cache: clean block?nother processor executing dcbst flush block?nother processor executing dcbf these operations can be viewed as transferring ownership back to main memory. they are often followed by an attempt to gain ownership, but these operations in themselves do not transfer ownership to another processor or cancel the reservation: 8.8.2 filtering options for reservations an l2 cache must also participate in bus operations to ensure correct operation of reservations. there is a range of options for ?tering reservations. the following sections describe two much different approaches and the hardware required for each. this section assumes that bus operations are passed on to support reservations; clearly an operation may be passed either for supporting reservations, coherency, or both. 8.8.2.1 minimal reservation support the simplest approach to reservation ?tering relies only on an indication that a reservation exists; for example assertion of the rsr v signal. in this case, when no reservation is indicated by the processor (rsr v negated) no reservation-in?encing operation (or read- in?enced operation, for example, read/read atomic operations that might need to have sys_shd asserted) need to be passed on to the processor. when a reservation is held by the processor ( rsr v asserted) all reservation-in?encing operations are passed on to the processor. all reservation-in?enced operations are responded to with sys_shd asserted. chapter 8. system considerations 8-9 this approach requires no state in the l2 (or higher level cache) and relies upon simply gathering the various reservation indications at any level and passing them down as a uni?d signal to lower levels to allow reservation in?encing operations to propagate back up the tree where they can be selected by a branch that is interested in such operations. while the hardware requirements are simple in this approach, performance is affected in the following ways: the available system bus bandwidth is reduced while operations are retried pending a response from the top of the tree. intermediate buses are tied up so other processors cannot have higher level misses serviced. many read operations are cached as shared instead of exclusive, which generates unnecessary bus traf? later when stores are performed to those addresses. for a large multiprocessor system, this could cause signi?ant loss in total bandwidth. implementation of this scheme only requires timely assertion of the rsr v signal by a processor. if rsr v were asserted by the end of the cycle after aa ck assertion for operations where a read-atomic operation is required, or the next three-state after setting the reservation for a cache hit, then there is adequate time for the l2 controller to prepare for future system bus operations. 8.8.2.2 improved reservation snooping a more hardware-intensive approach to ?tering is to require l2 caches to contain registers and comparators for the address associated with a speci? processor s reservation. the controller only passes on reservation-modifying cycles from the system bus side to the processor bus side and can participate directly in reservation-in?enced cycles. thus, only those addresses with actual outstanding reservations causes accesses to be retried on the system bus, intermediate buses being unavailable, and placed in other caches as shared only when necessary to maintain a reservation. to provide this level of support, a processor must always ensure lower levels can snoop addresses on which a reservation is placed. in the case of either noncacheable or cacheable miss operations, the address is transmitted during the read-atomic operation that acquires the data. for cacheable snoop hits, an address-only bus operation should be performed, to allow the reservation address to be passed cleanly from the processor to any l2 caches. this additional bus operation type is proposed since there are problems in using the current read-atomic operation in the face of a cache hit. while there are many ways of trying to use the current data transferring read-atomic operation, there are problems with both the l1 and higher level caches dealing with the case of modi?d data already resident in the cache. for these reasons it is cleaner to require a new bus operation type which would transmit a reservation address down from one level in the hierarchy to the next below. additionally, the reservation address needs to be cleared so higher levels of the memory hierarchy can stop snooping for reservations. this stwcx. address-only operation is an optimization, and is not required. the cost of not clearing the reservation address is that a 8-10 powerpc microprocessor family: the bus interface for 32-bit microprocessors small amount of unnecessary snoop operations is sent up the memory hierarchy to the processor assumed to be holding the reservation and a small amount of system bus bandwidth is lost to unnecessary retries. 8.8.2.3 l warx/stwcx. ad dress-only operation an lwarx/stwcx. address-only operation should meet several criteria. most importantly, it should not cause abnormal system behavior in systems designed around the 601 and only sampling tt[0?]. for this reason they have been mapped to operations such as clean block and ?sh block that are innocuous from a system perspective. this yields the tt[0?] encodings shown in table 8-2. 8.8.2.4 software implications bus traf? should be considered when system software deals with semaphores. noncacheable semaphores incur no additional overhead because all lwarx/stwcx. operations are broadcast anyway. however, if the semaphore was in the cache, cacheable semaphores may cause additional address-only bus cycles for each lwarx instruction executed. likewise, write-back, cacheable semaphores may cause additional address-only bus cycles for each stwcx. operation. this small overhead may dictate some software considerations if lwarx/stwcx. are used frequently. for example, to reduce bus bandwidth for heavily-used semaphores, something like the following test and test and set operation may be needed: loop: ld rn,s cmpi rn, val bcc loop lwarx rn,s ops as required stwcx. rm,s bne loop. the preceding operation may be more useful than the following test and set operation: loop: lwarx rn,s ops as required stwcx. rm,s bne loop. table 8-2. transfer type settings for lwarx/stwcx. address-only operation tt[0?] cycle 00001 set lwarx address. 00010 clear reservation address. appendix a. processor summary a-1 appendix a processor summary a0 a0 this section provides an overview of the different functionality of the powerpc 601, 603 and 604 processors. the 603 supports coherent memory, but does not explicitly support multiprocessors or l2 caches. the 601 and 604 support both multiprocessing con?urations and l2 caches. table a-1 summarizes differences in bus and memory coherency behavior between the 60x processors. table a-1. bus and memory coherency behavior summary functionality 601 603 604 cache set element (bits) 3 1 2 line?l strategy critical quad word critical double word critical double word cache coherency protocol mesi mei mesi broadcast cache operations yes no yes tlbi on bus yes no yes tlbisync sync bus operation tlbisync input signal tlbsync bus operation icbi on bus n/a no yes (extra tt operation) eieio on bus sync bus operation no eieio bus operation no-dr tr y mode no no-dr tr y mode no-dr tr y /data streaming mode clocking direct, with phase inputs (pclk_en, bclk_en) pll pll window of opportunity usage varies snoop push only snoop push only snoop push buffering varies dedicated dedicated fast push after ar tr y if parked yes no no high priority push (input signal) yes no no read with no intent to cache no yes yes timing of ar tr y /shd restore (see notes at end of signal tables) abb broadcast lwarx indicator on cache hit no no yes a-2 powerpc microprocessor family: the bus interface for 32-bit microprocessors differences in programming model (for example, implementation-speci? special-purpose registers) are described in the user s manual for each device. slight differences in tc encodings data bus disable signal no yes yes t a required during last dr tr y no yes yes snoop response signals ar tr y , shd ar tr y ar tr y , shd misaligned within double word that crosses word causes two accesses no yes yes time base source rtc input system clock system clock time base enable rtc tben tben power management several modes nap mode power management signals qreq , qa ck run, halted dbw o only with pending read nothing transfer read transfer read data mirroring on ci writes yes no unspeci?d early ar tr y data tenure termination no yes yes processor id value in pio from pid register zero from pid register snoop for misplaced pio reply yes no no 32-bit data transfer mode no yes no reduced pinout mode no yes no ckstp_out asserted, outputs high impedance for ckstp_in assertion no yes yes optional machine check for checkstop condition no yes yes cancel reservation on snooped rwitm yes no yes snoop nonglobal transactions for reservation cancellation no yes no stwcx. treated as write-through no yes no wt state on snoop push see section 2.4.7, ?ransfer code (tcn) output see section 2.4.7, ?ransfer code (tcn) output see section 2.4.7, ?ransfer code (tcn) output table a-1. bus and memory coherency behavior summary (continued) functionality 601 603 604 appendix b. processor clocking overview b-1 appendix b processor clocking overview b0 b0 this appendix provides a short overview of clocking on the powerpc 60x processors. detailed information is provided in each processor s user s manual and hardware speci?ations. b.1 powerpc 601 microprocessor clocking the 601 requires an input clock, 2x_pclk, which operates at twice the processor rate. in addition, it requires the pclk_en signal, which de?es the phase of the internal processor clock, and the bclk_en signal, which likewise determines the phase of the internal bus clock, both relative to positive edges of the input clock. figure b-1 illustrates the clocks for the 601, with the bus clock enable selected to run at half the processor frequency. figure b-1. powerpc 601 processor clocking see the powerpc 601 risc microprocessor user s manual and powerpc 601 risc microprocessor hardware speci?ations for more information. 2x_pclk pclk_en pclk (internal) bclk_en bus clock cycles b-2 powerpc microprocessor family: the bus interface for 32-bit microprocessors b.2 powerpc 603 and powerpc 604 microprocessor clocking the 603 and 604 clocks are derived by internal phase-locked loops (pll) which lock onto the positive edge of the bus clock input. in a given system, it is required that the bus clock operate at a constant frequency, so the pll can maintain its lock. figure b-2. powerpc 603 and powerpc 604 processor clock generation clock selection inputs on the chips are sampled at reset to determine the clock ratio at which the part operates. the condition of these inputs also programs the vco to operate within its proper range. the 603 and 604 can operate with a variety of bus-and-processor clock frequency ratios. this functionality is described generally in the user s manuals and more speci?ally in the hardware speci?ations. bus_clk pclk_strap(0) pclk_strap(1) pll proc_clk bus_clk_int reg 603, 604 pclk_strap(2) appendix c. processor upgrade suggestions c-1 appendix c processor upgrade suggestions c0 c0 this appendix provides upgrade suggestions for the powerpc 601, powerpc 603, and powerpc 604 processors. c.1 powerpc 601 processor upgrade to 60x the recommended approach to disable the 601 when a 60x processor is plugged into an upgrade socket is as follows: apply hreset for at least 300 processor clock cycles. this causes all outputs to be placed in high-impedance state and put 0s in all internal latches and registers. the hreset signal can be held active longer if desired, but the minimum is 300 cycles. deactivate the 2x_pclk or pclk_en . this stops the processor and minimize any dynamic power. hold tst16 low to ensure that all ocds remain in a high-impedance state. all other test signals should be connected as speci?d in the powerpc 601 risc microprocessor hardware speci?ations . c.2 powerpc 603 processor upgrade to 604 or 60x figure c-1 illustrates the recommended connection that allows a 603 to upgrade to a 604 (and potentially other future 60x chips). c-2 powerpc microprocessor family: the bus interface for 32-bit microprocessors figure c-1. powerpc 603 to powerpc 604 processor upgrade option a designer should consider the following: the upgrade socket has 604 pinout. pin 275 is ordinarily an ognd pin. instead of connecting it to ground, use it as negative true output upgrade_sense to indicate presence of upgrade processor. put a pull-up resistor on upgrade_sense . on bga module, use any ground as upgrade sense. if the system does not use ckstp_in , use a pull-up resister. a gate is not required. if the system does not sample ckstp_out , multiplexing is unnecessary. connect in parallel: br , bg , ts , xa ts , abb , a[0?1], aa ck , ap n , ape , tt n , tc n , tsiz n , tbst , ci , wt , gbl , shd , ar tr y , dbg , dbw o , dbb , dh, dl, dbdis , dp n, dpe , t a , dr tr y , tea , int , smi , mcp , sreset , hreset , rsr v , tben, and sysclk. upgrade socket provides run input. the 603 has no equivalent function. upgrade socket provides halted output. the 603 has no equivalent function. 603 has qreq and qa ck . there are no corresponding signals on upgrade. no connection necessary on clkout (test only). 275 ckstp_in ckstp_in 603 a ckstp_out ckstp_out system checkstop upgrade_sense upgrade socket bus signals appendix c. processor upgrade suggestions c-3 ckstp_in is referred to as ckstp on earlier versions of the 603. likewise, ckstp_out is referred to as checkst op . processors may have different strappings on pll_cfg. programmability on the upgrade socket is recommended. analog vdd inputs on the two processors should each have dedicated ?ter network. connection of trst , tdi, tdo, tms, and tck is a function of system jtag testing requirements. pull-up l1_test_clk, l2_test_clk, and lssd_mode on the 603 and l1_test_clk, l2_test_clk, lssd_mode , and array_wr on the upgrade socket. c.3 powerpc 604 processor upgrade to 60x the following describes the recommended connection to provide for upgrades from 604 to future processors: pin 275 is ordinarily an ognd signal. instead of connecting it to ground, use it as negative true output upgrade_sense to indicate presence of upgrade processor. put a pullup resistor on upgrade_sense . on bga module, use any ground as upgrade sense if the system does not use the ckstp_in input, use a pull-up resister. a gate is not required. connect the following signals in parallel: br , bg , ts , xa ts , abb , a[0?1], aa ck , ap n , ape , tt n , tc n , tsiz n , tbst , ci , wt , gbl , shd , ar tr y , dbg , dbw o , dbb , dh, dl, dbdis , dp n , dpe , t a , dr tr y , tea , int , smi , mcp , sreset , hreset , rsr v , tben, and sysclk. no connection necessary for clkout (test only). processors may have different strappings on pll_cfg. programmability on upgrade socket is recommended. analog vdd inputs on the two processors should each have dedicated ?ter network. connection of trst , tdi, tdo, tms, and tck is a function of system jtag testing requirements. pull up l1_test_clk, l2_test_clk, lssd_mode , and array_wr on both processors. c-4 powerpc microprocessor family: the bus interface for 32-bit microprocessors appendix d. l2 considerations for the powerpc 604 processor d-1 appendix d l2 considerations for the powerpc 604 processor d0 d0 this l2 cache reduces the average memory access time for each processor and partitions bus traf? between the various buses. in addition to keeping most of the individual processor bus traf? off of the system bus, this arrangement can screen memory coherency snoop traf?, keeping it off of individual processor buses. this section discusses the use of an l2 cache controller in a system con?uration shown in figure d-1. figure d-1. l2 cache controller organization the system bus may use a 60x bus or a bus of some other design. methods of designing a system with an l2 cache are as follows: no snoop ?tering?he simplest approach to an l2 system design is to not ?ter snoop activity. this is not practical for multiprocessor systems. keeping a copy of l1 tags?eeping a copy of the l1 tags in the l2 cache allows a system address to be compared against the l1 tags and the l2 tags in parallel. if neither directory matches, the processor/l2 cache complex is not involved in the current bus transaction and does not need to intervene in the operation. typically, intervention implies assertion of either sys-ar tr y or sys- shd . maintaining l1 state and tags?eeping a copy of the l1 tags and recording whether the cache block is in the s or e state allows the l2 cache to ?ter more snoop traf? from the processor than by saving the l1 state alone. system bus 60x bus 604 l2 d-2 powerpc microprocessor family: the bus interface for 32-bit microprocessors simple l1 inclusion?1 inclusion requires that an address cannot be in the l1 cache unless it is in the l2 cache. ensure that the contents of the l1 cache are a subset of the contents of the l2 cache. marked l1 inclusion?n addition to guaranteeing inclusion, marked inclusion keeps more information about when an l2 cache entry is also in the l1 cache. the advantages of marked l1 inclusion over simple l1 inclusion include being able to do a better job of snoop ?tering and reducing the amount of back invalidations. for each of these approaches (except the simplest case of performing no snoop ?tering), each description includes the following: requirements for saving state information?nformation about the kind and amount of state information that must be maintained. operations required for processor bus operations?nformation regarding operations that are necessary to maintain consistency between the l1 and l2 caches. system bus operation forwarding to the processor? description of the system bus operations must be passed to the processor for each con?uration. note that the pre? sys distinguishes system bus signals from 60x signals with the same name. for example, the system bus counterpart to the 60x signal shd is sys- shd . d.1 un?tered snooping the simplest way to design in an l2 cache is to not ?ter snoop operations. the l2 cache responds to snoop requests from the system bus after ?st passing the snoop request through to the l1 cache.this following design issues should be considered: if the processor s l1 cache has a second tag port dedicated to snooping, the processor is not stalled for unnecessary snoops. this is true for the 601 and 604 but not for the 603. the time the external address bus is busy with unnecessary snoops is not a signi?ant portion of the address bandwidth required by the processor. a multiprocessor system cannot meet this condition practically; however, a single-processor system can meet the condition if dma address bandwidth is low. d.2 keeping a copy of l1 tags keeping a copy of the l1 tags in the l2 cache allows a system address to be compared against the l1 tags and the l2 tags in parallel. if neither directory matches, the processor/l2 cache complex is not involved in the current bus transaction and does not need to intervene. typically, intervention implies assertion of either sys-ar tr y or sys- shd . if only the l2 tag matches, the l2 cache must intervene. if an l1 tag matches, the system address must be passed to the processor so it can respond. after the processor responds and completes any necessary snoop response, the system bus operation can be rerun against the possibly-changed state of the l2 cache. appendix d. l2 considerations for the powerpc 604 processor d-3 d.2.1 requirements for saving state information this approach requires the implementation of a set of cache tags and comparators to maintain the addresses in the primary cache. for separate 16-kbyte instruction and data caches, each cache directory must have 4 by 128 entries of a valid bit and 20 tag bits, for a combined total of 21 kbits and eight 21-bit comparators. the valid bit is assumed to be implemented as a seven-transistor cell, as it should be cleared at power-up for repeatability and testing. because this resource is a small fraction of the total tag required for a 1-mbyte l2 (approximately 256 kbits), it easily can be placed in the same controller. this tag array must be able to read a tag and valid bit, to write a tag and valid bit, and compare a tag and valid bit against an address (and assumed one valid bit). a second state requirement is a set of registers with associated comparators per register (termed the copy-back address registers) to hold addresses displaced from the l1 tags that need snooping. one such register is needed for each copy-back buffer on the processor. these registers need only be able to write and compare. reading them is unnecessary. d.2.2 operations required for processor bus operations apart from the memory required, the l2 cache must determine when a replacement operation has been performed by the l1 cache. this determination together with the cache set information allows the l2 to update the l1 directory copy so as to insert the new address information. therefore, the following processor bus operations must be decoded and dealt with as indicated. read, rwitm, read atomic, rwitm atomic?rovided the cache inhibit (ci ) signal is not asserted, the tag is allocated in the l1 directory as indicated by the cse n signals and address. additionally, in the 604 these allocations must indicate whether they have caused a data and address pair to be transferred into a copy-back buffer. note that the 601 position is that the cache directory model must be increased in associativity by as many buffers as exist. for example, the 604 would require a four-way instruction cache model and ?+1 data cache model. it is referred to as ?+1 because it is not truly a ?e-way model since groups 0? are selected by cse, group 4 is replaced by the evicted tag. kill block?c0 asserted distinguishes kill block operations that deallocate cache entries (caused by dcbi cache operations) from kill block operations that allocate entries in the l1 cache (caused by dcbz cache operations) or retain a cache entry (a store to a shared entry). likewise, kill-block operations as a result of a dcbz operation must also indicate (through tc2) whether an entry has been placed in a copy-back buffer. whenever an allocation is generated by the processor that uses a copy-back buffer, the previous l1 directory entry must be saved into a copy-back address register. the use of these registers is simple ?st-in/?st-out. it is unnecessary to copy the valid bit from the tag directory into the address register, but for repeatability and testing, d-4 powerpc microprocessor family: the bus interface for 32-bit microprocessors it is useful to have in the copy-back address registers a valid bit that is initialized to invalid at power-up and loaded with the valid bit from the tag array when an address is copied into one of these registers. the copy-back address registers could strictly have their valid bit reset whenever a write-with-kill operation matches (which indicates the castout operation is occurring), but this is an optimization that is probably unnecessary. the copy-back address register is more likely to be reloaded with another displaced tag from the l1 than it is to detect a match on the system bus side. d.2.3 forwarding system bus operations to the processor when a system bus operation (with sys-gbl asserted) occurs on the system bus, it is compared against both the copy of the l1 directory and the copy-back address registers. if there is a match, the system bus operation must be forwarded onto the processor to determine the ?al outcome. if no match occurs, the addressed data does not reside in the processor and so can complete. note that this does not address the match in l2 case, which is a separate issue. however, instead of simply loading the valid bit of the l1 directory shadow with a one when an allocation is detected, it could be loaded with the value of the gbl signal. comparisons against system bus operations (which were marked as sys-gbl ) would still compare the valid bit read from the tag arrays against one. this automatically maximizes use of the information supplied on the gbl signal. note the following discussion of system bus operations is concerned with snoop ?tering, and hence memory-accessing operations. clearly operations such as tlbie and sync that do not involve memory accesses are not ?tered and are passed to the processor unchanged. d.3 maintaining l1 state and tags an alternative to simply keeping the tags of an l1 cache is to keep state information about the l1 cache, namely whether the cache line is in the s or e state. keeping this information allows the l2 cache to ?ter even more snoop traf? from the processor. because some transitions, such as e to m, are invisible outside the processor, maintaining an identical copy of the l1 cache block state is impossible. thus, the l1 directory copy is restricted to keeping the following range of states, i, s, and ex. the ex state describes both cases of the processor having the data exclusively (m and e). ex implies simply valid but not shared and does not distinguish whether the data has been modi?d with respect to main memory or to the l2 cache. appendix d. l2 considerations for the powerpc 604 processor d-5 d.3.1 requirements for saving state information the only state additional to that for the simple copy of tags structure required is a single state bit. the tag entry for the data cache now looks like 4 by 128 entries of a valid bit, shared/exclusive bit, and 20 tag bits, which requires only an extra 1/2 kbits. however, it is likely that the increase would be 1 kbits since probably the same macrocell would be used in the instruction and data halves of the l1 copy. d.3.2 operations required for processor bus operations logic must detect not only whether a tag matches in the l1 directory copy for a system bus operation, but also the type of intervention required and whether it must be passed to the processor before it can complete. this logic is on the critical path for the snoop access, because it must be determined whether sys-ar tr y or sys-shd needs to be asserted in response to the system bus operation. along with operations monitored for l1 tag maintenance, the cases in table d-1 need to be distinguished. table d-1. operations required for processor bus operations bus operation allocate/deallocate action read, read atomic allocate as per discussion above state loaded as s if sys-shd was asserted or ex if sys-shd was negated. it is assumed that the value of shd re?cts the value of sys-shd sampled, which is practical in a single-processor system. in a multiprocessor system, it may be desirable to always assert shd on a read regardless of the state of sys-shd . rwitm, rwitm atomic allocate as per discussion above state loaded as ex. write with kill allocate state goes to ex (or s, see below). write with kill deallocate state goes to inv. it may not match in l1 tag, and the address may already be transferred into the copy-back address register. kill block store into s cache block or allocate allocate tag if necessary and state goes to ex. kill block deallocate state goes to i. icbi state goes to i. flush block state goes to i. write with kill distinguished as with kill block. if tc0 is asserted, the address is deallocated from the cache; if tc0 is negated, then it is retained in the cache (there are no actual allocations associated with a write with kill). as a further optimization, tc1 can be used to determine the ?al l1 cache state for a write with kil (allocate). if tc1 is asserted, the cache state is s and if it is negated, the cache state is e; the l1 state should be set to s or ex, respectively. d-6 powerpc microprocessor family: the bus interface for 32-bit microprocessors d.3.3 forwarding system bus operations to the processor if an l1 tag entry was marked as s, for system read operations (a fairly common occurrence) the l2 controller can directly respond with the sys-shd signal without requiring an access to the processor s cache. this not only reduces processor-to-l2 address bus interference, it also improves the system bus bandwidth, as the system bus operation would not need to be retried during interrogation of the processor s l1 cache. whenever the l1 tag state is e or whenever something other than a simple read operation is performed on the system bus, the operation passes to the processor to determine the ?al outcome. d.4 simple l1 inclusion l1 inclusion requires that when an address is not in the l2 cache, it is also not in the l1 cache. although this functionally is the same as saying that an address cannot be in the l1 cache unless it is in the l2 cache, the ?st de?ition more closely re?cts how l1 inclusion is implemented. the simplest approach to l1 inclusion in an l2 cache is to require that whenever something is discarded from the l2 cache, to ensure that it is also discarded from the l1 cache through a back invalidation. in this discussion, the l2 cache is assumed to use four-state mesi protocol. simpli?ations to a three-state protocol are trivial. d.4.1 requirements for saving state information simple l1 inclusion requires the same tags and state as is needed to implement the l2. note that the state information can be kept across the l2 cache block or per-coherency granule. d.4.2 operations required for processor bus operations the operations performed and monitored to maintain l1 inclusion are like those required for maintaining l1 tags (see section d.3.2, ?perations required for processor bus operations.? however, as is discussed below, there is no need to be concerned with the indication that a copy-back buffer is being used. when an allocation is performed by the l1 cache, the l2 cache must also ensure that a tag is allocated. before allocation of a tag in the l2, a back invalidation (?sh block or rwitm) for each coherency granule removed must be sent to the processor. these back invalidations cause either a snoop miss or hit. for a snoop miss, the l1 cache has replaced that entry (either previously or for the current allocation) and no further work is needed. only when snoop push-backs required for all removed granules are completed can the old tag be removed from the l2 directory and the fetch for the new tag begin. the replacement of a tag from the l2 may itself require a copy-back to main memory. whether the copy-back is buffered is independent from the maintenance of the inclusion. appendix d. l2 considerations for the powerpc 604 processor d-7 allocation operations for l2 inclusion are decoded like those required for maintaining l1 tag copies. with l1 inclusion however, there is no need to monitor the l1 cache s use of its copy-back buffers because the back invalidations force any modi?d data replaced to be copied back to the l2 level (if not all the way to main memory) before the fetch operations can proceed, thereby reducing the bene? of copy-back buffers in such an environment. the data cache block allocated in an l2 cache can be larger than that in the l1 cache, in which case multiple back-invalidation operations to the l1 cache may be required whenever a tag is deallocated in the l2 cache, depending upon whether subblocking is implemented. this increases the latency of such operations and must be weighed against the hit rate advantages of such a con?uration. d.4.3 forwarding system bus operations to the processor if l1 inclusion is assured, the following scenarios can be considered: the system bus operation does not match in the l2 directory. in this case, there can be no copy in the l1 cache, so the system bus operation requires no intervention. the address matches in the cache and is in e or m state. in this case, the system bus operation must be retried and the operation must be forwarded to the processor cache because it may have a more up-to-date copy of the data. the address matches in the cache and is in the s state. in this case, for the simple case of a read/read-atomic operation, sys-shd needs to be asserted only. other operations may require retrying, even if the data is only state s, as it is reasonable to wait until the operation completes at the processor before letting it complete on the system bus. d.5 marked l1 inclusion in addition to guaranteeing inclusion, marked inclusion keeps more information about when an l2 cache entry is also in the l1 cache. viewed narrowly, it is only necessary to require that when an entry is marked as not in the l1 cache, that it in fact not be present. it is acceptable to assume an entry is in the l1 when it is in fact not. without maintaining a structure that mimics the l1 directory, it is hard to closely match entries marked as included in the l1 with those that actually are. marked l1 inclusion offers reduction of back invalidations and more ef?ient snoop ?tering than simple l1 inclusion. d.5.1 requirements for saving state information the included state for a tag is typically independent of the coherency states supported by the l2 cache; for example, both a shared and an exclusive data entry can be present or not in the l1 cache. inclusion information is most easily kept on a 32-byte coherency granule (doing otherwise may complicate some mechanisms with no large bene?). consider the extra state required for a 1-mbyte l2 cache; for such a con?uration, the additional memory required is 32 kbits. d-8 powerpc microprocessor family: the bus interface for 32-bit microprocessors d.5.2 operations required for processor bus operations the operations performed to maintain marked l1 inclusion are like those required for simple l1 inclusion. when an allocation is performed by the l1 cache, the l2 cache must also ensure that a tag is allocated and that the inclusion bit for the accessed 32-byte granule is set (that this bit might already be set if a processor discarded this block without being detected). before an l2 cache tag can be allocated, it must be inspected. if the tag contains address granules for which the inclusion bit is set, a back invalidation for each granule must be sent to the processor. if they do not, the tag can be removed directly, assuming that the data is copied back to main memory as required by the state indicated for the cache blocks. the old tag can be removed from the l2 directory and the fetch for the new tag can begin only when the snoop push-backs required for all included granules are completed. as with simple inclusion, replacing a tag from the l2 may require a copy-back to main memory. the inclusion bit is reset whenever a processor bus operation is performed, which visibly removes an entry from the l1 cache. these operations are as follows: write with kill (deallocate)? cache castout operation or a snoop response in which the l1 cache state goes to invalid, so the inclusion bit can be reset. kill block (deallocate)?he result of dcbi instruction, l1 cache state goes to invalid, so the inclusion bit can be reset. icbi?he result of an icbi instruction, l1 cache state goes to invalid, so the inclusion bit can be reset. other operations indicate an entry has been removed from the l1 cache; in particular read operations. however, because it is not easy to determine for which l2 cache block to reset the inclusion bit, l2 inclusion bits can only approximate the actual l1 contents. the exact l2 set index to use can be determined only by creating a structure like the l1 directory that can track both the l1 set index and the l1 group entry information. this structure could be simpler than the l1 directory because it needs no comparator and because it requires only an l2 index entry rather than a tag. however, adding this structure to marked l1 inclusion requires more resources and is needlessly more complicated than simply implementing an l2 cache with a copy of the l1 tag and state information. d.5.3 forwarding system bus operations to the processor when a system bus operation is run, the cases of interest are as follows: snoop miss?ecause of l1 inclusion, there is no need to pass the operation onto the processor. snoop hit with inclusion bit reset?here is also no need to pass the operation to the processor. snoop hit with inclusion bit set?he operations are identical to simple l1 inclusion operations. appendix e. coherency action tables e-1 appendix e coherency action tables e0 e0 the tables in this appendix describe the behavior of the 60x bus when certain operations are presented to the bus. these tables describe the difference in how the bus operates depending upon such factors as the wim bit settings, the current mesi state, the setting of the transfer type signals (tt[0?]), and the signals that are presented as the result of the operation (ar tr y and shd ) being snooped on the bus. the tables in this appendix also indicate the difference in how speci? 60x processors respond to certain operations. abbreviations used in these tables are described in table e-1. for a description of these operations and others listed in these tables, refer to section 4.7, ?escriptions of bus transactions and snoop responses.? table e-1. guide to abbreviations abbreviation meaning lrs lwarx reservation set rda read atomic rwitm read-with-intent-to-modify rwitma read-with-intent-to-modify-atomic sbr single-beat read sbra single-beat read atomic sbw single-beat write wwf write-with-?sh wwfa write-with-?sh-atomic wwk write-with-kill e-2 powerpc microprocessor family: the bus interface for 32-bit microprocessors e.1 load operations table e-2 indicates the behavior and response of 60x processors when a load operation is presented to the system bus. table e-2. coherency actions?oad operations cache processor bus snoop response processor response wim mesi operation wim 3 tt[0?] 000 i 60x read 000 01010 (none) load block into cache forward data to perform load mark cache block e 603 rwitm 01110 60x read 000 01010 shd load block into cache forward data to perform load mark cache block s 603 rwitm 01110 load block into cache load from cache mark cache block e 60x read 000 01010 ar tr y or ar tr y &shd release bus retry operation 603 rwitm 01110 mes 1 60x (none) (n/a) (n/a) (n/a) load from cache 001 i 60x read 001 01010 (none) load block into cache forward data to perform load mark cache block e 603 rwitm 01110 load block into cache load from cache mark cache block e 60x read 001 01010 shd load block into cache forward data to perform load mark cache block s 603 rwitm 01110 load block into cache load from cache mark cache block e 60x read 001 01010 ar tr y or ar tr y &shd release the bus retry the operation 603 rwitm 01110 mes 1 60x (none) (n/a) (n/a) (n/a) load from cache appendix e. coherency action tables e-3 x1x i 60x sbr w1m 01010 (none) or shd load from main memory 603 rwitm 01110 60x sbr w1m 01010 ar tr y or ar tr y &shd release the bus retry the operation 603 rwitm 01110 es 1 60x sbr w1m 01010 (none) or shd load from main memory e 603 rwitm x1m 01110 (none) or shd load from main memory 601 (none) (n/a) (n/a) (n/a) mark cache block i cache retry the operation es 1 60x sbr w1m 01010 ar tr y or ar tr y &shd release the bus retry the operation e 603 rwitm x1m 01110 ar tr y or ar tr y &shd release the bus retry the operation 601 (n/a) (n/a) (n/a) (n/a) (n/a) m 60x sbr w1m 01010 (none) or shd paradox 2 ?ache should be i load from main memory 603 rwitm 01110 601 wwk 00110 (n/a) flush the block mark cache block i cache retry the operation 60x sbr w1m 01010 ar tr y or ar tr y &shd paradox 2 ?ache should be i release the bus retry the operation 603 rwitm 01110 601 wwk 00110 100 i 60x read 100 01010 (none) load block into cache load from cache mark the cache block e 603 rwitm 01110 60x read 100 01010 shd load block into cache load from cache mark cache block s 603 rwitm 01110 load block into cache load from cache mark cache block e 60x read 100 01010 ar tr y or ar tr y &shd release the bus retry the operation 603 rwitm 01110 mes 1 (none) (n/a) (n/a) (n/a) load from cache table e-2. coherency actions?oad operations (continued) cache processor bus snoop response processor response wim mesi operation wim 3 tt[0?] e-4 powerpc microprocessor family: the bus interface for 32-bit microprocessors 101 i 60x read 101 01010 (none) load block into cache load from cache mark cache e 603 rwitm 01110 60x read 101 01010 shd load block into cache load from cache mark cache block s 603 rwitm 01110 load block into cache load from cache mark cache block e 60x read 101 01010 ar tr y or ar tr y &shd release the bus retry the operation 603 rwitm 01110 mes 1 60x (none) (n/a) (n/a) (n/a) load from cache notes: 1 because it does not implement shared state, these entries are not applicable to the 603. 2 a coherency paradox to the processor may cause incoherent data to appear in the system. that is, there is a potential for data integrity errors in the system. 3 the wim bits in this column are active-high representations of the active-low wt , ci , and gbl 60x bus signals, respectively. thus, a wim = 101 value corresponds to 60x signal value of wt , ci , gbl = 010. table e-2. coherency actions?oad operations (continued) cache processor bus snoop response processor response wim mesi operation wim 3 tt[0?] appendix e. coherency action tables e-5 e.2 store operations table e-3 describes the behavior of the 60x bus in response to store operations. table e-3. coherency actions?tore operations cache proc. bus snoop response processor response wim mesi operation wim tt[0?] 000 i 60x rwitm 000 01110 (none) or shd load block into cache store to cache mark cache m ar tr y or ar tr y &shd release the bus retry the operation s 1 60x kill 000 01100 (none) or shd after kill is successfully presented: store to cache mark cache block m ar tr y or ar tr y &shd release the bus retry the operation e 60x (none) (n/a) (n/a) (n/a) store to cache mark cache block m m 60x (none) (n/a) (n/a) (n/a) store to cache 001 i 60x rwitm 001 01110 (none) or shd load block into cache mark cache block e store to cache mark cache block m ar tr y or ar tr y &shd release the bus retry the operation s 1 60x kill 001 01100 (none) or shd after kill is successfully presented: mark cache block e store to cache mark cache block m ar tr y or ar tr y &shd release the bus retry the operation e 60x (none) (n/a) (n/a) (n/a) store to cache mark cache block m m 60x (none) (n/a) (n/a) (n/a) store to cache e-6 powerpc microprocessor family: the bus interface for 32-bit microprocessors x1x i 60x wwf x1m 00010 (none) or shd store to main memory 601 ar tr y or ar tr y &shd release the bus retry the operation es 1 60x wwf x1m 00010 (none) or shd paradox 2 ?ache should be i store to main memory 601 (none) (n/a) (n/a) (n/a) mark cache block i cache retry the operation 60x wwf x1m 00010 ar tr y or ar tr y &shd paradox 2 ?ache should be i release the bus retry the operation 601 (n/a) (n/a) (n/a) (n/a) (n/a) m 60x wwf x1m 00010 (none) or shd paradox 2 cache should be i store to main memory 601 wwk 00110 flush the block mark cache block i cache retry the operation 60x wwf x1m 00010 ar tr y or ar tr y &shd paradox 2 cache should be i release the bus retry the operation 601 wwk 00110 100 i 60x wwf 100 00010 (none) or shd store to main memory ar tr y or ar tr y &shd release the bus retry the operation s 1 (none) or shd store to cache store to main memory ar tr y or ar tr y &shd release the bus retry the operation e (none) or shd store to cache store to main memory ar tr y or ar tr y &shd release the bus retry the operation m 60x wwf 00010 (none) or shd store into cache store into main memory 601 wwk 00110 push the block mark cache block e cache retry the operation 60x wwf 00010 ar tr y or ar tr y &shd release the bus retry the operation 601 wwk 00110 table e-3. coherency actions?tore operations (continued) cache proc. bus snoop response processor response wim mesi operation wim tt[0?] appendix e. coherency action tables e-7 101 i 60x wwf 101 00010 (none) or shd write to main memory ( note : no reload on a store miss) ar tr y or ar tr y &shd release the bus retry the operation s 1 (none) or shd store to cache store to main memory ar tr y or ar tr y &shd release the bus retry the operation e (none) or shd store to cache store to main memory ar tr y or ar tr y &shd release the bus retry the operation m 60x wwf 00010 (none) or shd store to cache store to main memory 601 wwk 00110 push block mark cache block e cache retry the operation 60x wwf 00110 ar tr y or ar tr y &shd release the bus retry the operation 601 wwk 00110 notes: 1 because it does not implement shared state, these entries are not applicable to the 603. 2 a coherency paradox to the processor may cause incoherent data to appear in the system; that is, there is a potential for data integrity errors in the system. table e-3. coherency actions?tore operations (continued) cache proc. bus snoop response processor response wim mesi operation wim tt[0?] e-8 powerpc microprocessor family: the bus interface for 32-bit microprocessors e.3 lwarx operations table e-4 describes the behavior of the 60x bus in response to lwarx operation generated by the execution of an lwarx instruction. note that the reservation entry in this table refers to reservations associated with the lwarx instruction. table e-4. coherency actions?warx operations cache proc. bus reservation snoop response processor response wim mesi operation wim tt[0?] 000 i 60x rda 000 11010 set by this op (none) load block into cache set reservation load from cache mark cache block e 603 rwitma 11110 60x rda 000 11010 set by this op shd load block into cache set reservation load from cache mark cache block s 603 rwitma 11110 load block into cache set reservation load from cache mark cache block e 60x rda 000 11010 (n/a) ar tr y or ar tr y &shd release the bus retry the operation 603 rwitma 11110 mes 1 60x lrs 000 00001 set by this op (none) or shd set reservation load from cache 601/603 (n/a) (n/a) (n/a) (n/a) 60x lrs 000 00001 (n/a) ar tr y or ar tr y &shd release the bus retry the operation 601/603 (n/a) (n/a) (n/a) (n/a) (n/a) appendix e. coherency action tables e-9 001 i 60x rda 001 11010 set by this op (none) load block into cache mark cache block e set reservation load from cache 603 rwitma 11110 60x rda 001 11010 set by this op shd load block into cache set reservation load from cache mark cache block s 603 rwitma 11110 load block into cache set reservation load from cache mark cache block e 60x rda 001 11010 (n/a) ar tr y or ar tr y &shd release the bus retry the operation 603 rwitma 11110 mes 1 60x lrs 001 00001 set by this op (none) or shd set reservation load from cache 601/603 (n/a) (n/a) (n/a) (n/a) 60x lrs 001 00001 (n/a) ar tr y or ar tr y &shd release the bus retry the operation 601/603 (n/a) (n/a) (n/a) table e-4. coherency actions?warx operations (continued) cache proc. bus reservation snoop response processor response wim mesi operation wim tt[0?] e-10 powerpc microprocessor family: the bus interface for 32-bit microprocessors x1x i 60x sbra x1m 11010 set by this op (none) or shd set reservation load from main memory 603 rwitma 11110 60x sbra x1m 11010 (n/a) ar tr y or ar tr y &shd release the bus retry the operation 603 rwitma 11110 es 1 60x rda x1m 11010 set by this op (none) or shd set the reservation load from main memory 603 rwitma 11110 601 (none) (n/a) (n/a) (n/a) (n/a) mark cache block i cache retry the operation 60x rda x1m 11010 (n/a) ar tr y or ar tr y &shd release the bus retry the operation 603 rwitma 11110 601 (none) (n/a) (n/a) (n/a) (n/a) m 60x rda x1m 11010 set by this op (none) or shd paradox 2 ?ache should be i set the reservation load from main memory 603 rwitma 11110 601 wwk 00110 (n/a) flush the block mark cache block i cache retry the operation 60x rda x1m 11010 (n/a) ar tr y or ar tr y &shd paradox 2 ?ache should be i release the bus retry the operation 603 rwitma 11110 601 wwk 00110 100 i 601 3 rda 100 11010 set by this op (none) load block into cache set reservation load from cache mark cache block e shd load block into cache set reservation load from cache mark cache block s (n/a) ar tr y or ar tr y &shd release the bus retry the operation mes 601 3 rda (n/a) (n/a) set by this op (n/a) set reservation load from cache table e-4. coherency actions?warx operations (continued) cache proc. bus reservation snoop response processor response wim mesi operation wim tt[0?] appendix e. coherency action tables e-11 e.4 stwcx operations table e-5 describes the behavior of the 60x bus in response to stwcx operation generated by the execution of an stwcx. instruction. note that the reservation entry in this table refers to reservations set by the lwarx instruction and cleared either by the stwcx. instruction or by a snoop operation. 101 i 601 3 rda 101 11010 set by this op (none) load block into cache set reservation load from cache mark cache block e shd load block into cache set reservation load from cache mark cache block s (n/a) ar tr y or ar tr y &shd release the bus retry the operation mes 601 3 (n/a) (n/a) (n/a) set by this op (n/a) set reservation load from cache notes: 1 because it does not implement shared state, these entries are not applicable to the 603. 2 a coherency paradox to the processor may cause incoherent data to appear in the system. that is, there is a potential for data integrity errors in the system. 3 an lwarx to a page marked write-through causes a dsi exception; therefore, this transaction does not occur on the bus. table e-5. coherency actions?twcx operations cache proc. bus res. snoop response processor response wim mesi operation wim tt[0?] 000 i 60x (none) (n/a) (n/a) none (n/a) update cr 60x rwitma 000 11110 yes (and reset) (none) or shd load block into cache release the reservation update cr store to cache mark cache m 603 wwfa 10010 issue wwf on the bus release the reservation update cr 60x rwitma 000 11110 yes ar tr y or ar tr y &shd release the bus retry the operation 603 wwfa 10010 table e-4. coherency actions?warx operations (continued) cache proc. bus reservation snoop response processor response wim mesi operation wim tt[0?] e-12 powerpc microprocessor family: the bus interface for 32-bit microprocessors 000 s 1 60x (none) (n/a) (n/a) none (n/a) update cr kill 000 01100 yes (and reset) (none) or shd after kill is successfully presented: release reservation update cr store to cache mark cache block m 01100 yes ar tr y or ar tr y &shd release the bus retry the operation e 60x (none) (n/a) (n/a) none (n/a) update cr 60x (none) (n/a) (n/a) yes (and reset) (n/a) release reservation update cr store to cache mark cache block m 603 wwfa 000 10010 (none) or shd wwfa on the bus wait for write to complete release reservation update cr store to cache 60x (none) (n/a) (n/a) yes (and reset) (n/a) (n/a) 603 wwfa 000 10010 ar tr y or ar tr y &shd release the bus retry the operation m 60x (none) (n/a) (n/a) none (n/a) update cr 60x (none) (n/a) (n/a) yes (and reset) (n/a) release reservation update cr store to cache 603 wwfa 000 10010 (none) or shd wwfa on the bus wait for write to complete release reservation update condition register store to cache 60x (none) (n/a) (n/a) yes(and reset) (n/a) (n/a) 603 wwfa 000 10010 ar tr y or ar tr y &shd release the bus retry the operation table e-5. coherency actions?twcx operations (continued) cache proc. bus res. snoop response processor response wim mesi operation wim tt[0?] appendix e. coherency action tables e-13 001 i 60x (none) (n/a) (n/a) none (n/a) update cr 60x rwitma 001 11110 yes (and reset) (none) or shd load block into cache release the reservation update the cr store to cache mark cache m 603 wwfa 10010 issue wwf on the bus release the reservation update the cr 60x rwitma 001 11110 yes ar tr y or ar tr y &shd release the bus retry the operation 603 wwfa 10010 s 1 60x (none) (n/a) (n/a) none (n/a) update cr kill 001 01100 yes (and reset) (none) or shd release reservation update cr mark cache block e store to cache mark cache block m ye s a r tr y or ar tr y &shd release the bus retry the operation e 60x (none) (n/a) (n/a) none (n/a) update cr 60x (none) (n/a) (n/a) yes (and reset) (n/a) release reservation update cr store to cache mark cache block m 603 wwfa 001 10010 (none) or shd wwfa on bus wait for write to complete release reservation update cr store to cache 60x (none) (n/a) (n/a) yes (n/a) (n/a) 603 wwfa 001 10010 ar tr y or ar tr y &shd release the bus retry the operation table e-5. coherency actions?twcx operations (continued) cache proc. bus res. snoop response processor response wim mesi operation wim tt[0?] e-14 powerpc microprocessor family: the bus interface for 32-bit microprocessors 001 m 60x (none) (n/a) (n/a) none (n/a) update cr 60x (none) (n/a) (n/a) yes (and reset) (n/a) release reservation update cr store to cache 603 wwfa 001 10010 (none) or shd wwfa on bus wait for write to complete release reservation update cr store to cache 60x (none) (n/a) (n/a) yes (n/a) (n/a) 603 wwfa 001 10010 ar tr y or ar tr y &shd release the bus retry the operation x1x i 60x (none) (n/a) (n/a) none (n/a) update cr wwfa x1m 10010 yes (and reset) (none) or shd release reservation update cr store to main memory ye s a r tr y or ar tr y &shd release the bus retry the operation es 1 60x (none) (n/a) (n/a) none (n/a) paradox 2 cache should be i update cr 601 mark cache block i cache retry the operation 60x wwfa x1m 10010 yes (and reset) (none) or shd paradox 2 cache should be i release reservation update cr store to main memory 601 (none) (n/a) (n/a) mark cache block i cache retry the operation 60x wwfa x1m 10010 yes ar tr y or ar tr y &shd paradox 2 cache should be i release the bus retry the operation 601 (n/a) (n/a) (n/a) (n/a) table e-5. coherency actions?twcx operations (continued) cache proc. bus res. snoop response processor response wim mesi operation wim tt[0?] appendix e. coherency action tables e-15 x1x m 60x (none) (n/a) (n/a) none (n/a) paradox 2 cache should be i update cr 601 wwk x1m 00110 (none) or shd flush the block mark cache block i cache retry the operation 60x (n/a) (n/a) (n/a) none (n/a) (n/a) 601 wwk x1m 00110 ar tr y or ar tr y &shd release the bus retry the operation 60x wwfa x1m 10010 yes (and reset) (none) or shd paradox 2 cache should be i release reservation update cr store to main memory 601 wwk 00110 flush the block mark cache block i cache retry the operation 60x wwfa x1m 10010 yes ar tr y or ar tr y &shd paradox 2 cache should be i release the bus retry the operation 601 wwk 00110 100 i 601 3 (none) (n/a) (n/a) none (n/a) update cr rwitma 100 11110 yes (and reset) (none) or shd load block of data into cache release reservation update the cr store to cache mark cache m (n/a) ar tr y or ar tr y &shd release the bus retry the operation s 601 3 (none) (n/a) (n/a) (none) (n/a) update cr kill 100 01100 yes (and reset) (none) or shd after kill is successfully presented: release reservation update cr store to cache mark cache block m 60x 3 (n/a) (n/a) (n/a) yes (n/a) see footnote 3 601 kill 100 01100 (n/a) ar tr y or ar tr y &shd release the bus retry the operation table e-5. coherency actions?twcx operations (continued) cache proc. bus res. snoop response processor response wim mesi operation wim tt[0?] e-16 powerpc microprocessor family: the bus interface for 32-bit microprocessors 100 e 601 3 (none) (n/a) (n/a) none ar tr y or ar tr y &shd update cr yes (and reset) (n/a) release reservation update cr store to cache mark cache block m m 601 3 (none) (n/a) (n/a) none (n/a) update cr yes (and reset) (n/a) release reservation update cr store to cache 101 i 601 3 (none) (n/a) (n/a) none (n/a) update cr rwitma 101 11110 yes (and reset) (none) or shd load block of data into cache release reservation update cr store to cache mark cache m (n/a) ar tr y or ar tr y &shd release the bus retry the operation s 601 3 (none) (n/a) (n/a) none (n/a) update cr kill 101 01100 yes (and reset) (none) or shd after kill is successfully presented release reservation update cr store to cache mark cache block m (n/a) 60x 3 (n/a) (n/a) (n/a) yes(and not reset) (n/a) see footnotes 2 and 3 s 601 kill 101 01100 (n/a) ar tr y or ar tr y &shd release the bus retry the operation table e-5. coherency actions?twcx operations (continued) cache proc. bus res. snoop response processor response wim mesi operation wim tt[0?] appendix e. coherency action tables e-17 e.5 dcbt operations table e-6 shows the coherency actions when a dcbt operation is generated by the execution of a dcbt instruction. 101 e 601 (none) (n/a) (n/a) none (n/a) update cr yes (and reset) release reservation update cr store to cache mark cache block m m 601 (none) (n/a) (n/a) none (n/a) update cr yes (and reset) release reservation update cr store to cache notes: 1 because it does not implement shared state, these entries are not applicable to the 603. 2 an stwcx. to a page marked write-though causes a dsi exception. therefore this bus transaction cannot occur. the state of reservation is not changed due to an stwcx. to a page marked write-though. 3 for all but 601s, an lwarx to a page marked write-through causes a dsi exception; therefore this transaction does not occur on the bus. table e-6. coherency actions?cbt operations cache processor bus snoop response processor response wim mesi operation wim tt[0?] 000 i 60x read 000 01010 (none) load block into cache mark the cache e 603 rwitm 01110 60x read 000 01010 shd load block into cache mark the cache s 603 rwitm 01110 load block into cache mark the cache e 60x read 000 01010 ar tr y or ar tr y &shd release the bus retry the operation 603 rwitm 01110 mes 1 60x (none) (n/a) (n/a) (n/a) no-op table e-5. coherency actions?twcx operations (continued) cache proc. bus res. snoop response processor response wim mesi operation wim tt[0?] e-18 powerpc microprocessor family: the bus interface for 32-bit microprocessors 001 i 60x read 001 01010 (none) load block into cache mark the cache e 603 rwitm 01110 60x read 001 01010 shd load block into cache mark the cache s 603 rwitm 01110 load block into cache mark the cache e 60x read 001 01010 ar tr y or ar tr y &shd release the bus retry the operation 603 rwitm 01110 mes 1 (none) (n/a) (n/a) (n/a) no-op x1x i 60x (none) x1m (n/a) (n/a) no-op 601 sbr es 1 60x (none) (n/a) (n/a) (n/a) no-op 601 mark cache block i cache retry the operation m 60x (none) (n/a) (n/a) (n/a) no-op 601 wwk x1m 00110 (none) or shd flush the block mark cache block i cache retry the operation m 60x (n/a) (n/a) (n/a) (n/a) (n/a) 601 wwk x1m 00110 ar tr y or ar tr y &shd release the bus retry the operation 100 i 60x read 100 01010 (none) load block into cache mark the cache e 603 rwitm 01110 60x read 100 01010 shd load block into cache mark the cache s 603 rwitm 01110 load block into cache mark the cache e 60x read 100 01010 ar tr y or ar tr y &shd release the bus retry the operation 603 rwitm 01110 mes 1 60x (none) (n/a) (n/a) (n/a) no-op table e-6. coherency actions?cbt operations (continued) cache processor bus snoop response processor response wim mesi operation wim tt[0?] appendix e. coherency action tables e-19 101 i 60x read 101 01010 (none) load block into cache mark the cache e 603 rwitm 01110 60x read 101 01010 shd load block into cache mark the cache s 603 rwitm 01110 load block into cache mark the cache e 60x read 101 01010 ar tr y or ar tr y &shd release the bus retry the operation 603 rwitm 01110 mes 1 (none) (n/a) (n/a) (n/a) no-op note: 1 because it does not implement shared state, these entries are not applicable to the 603. table e-6. coherency actions?cbt operations (continued) cache processor bus snoop response processor response wim mesi operation wim tt[0?] e-20 powerpc microprocessor family: the bus interface for 32-bit microprocessors e.6 dcbtst operations table e-7 describes the behavior of the 60x bus interface in response to the execution of a dcbtst instruction. table e-7. coherency actions?cbtst operations cache processor bus snoop response processor response wim mesi operation wim tt[0?] 000 i 60x read 000 01010 (none) load the block of data into cache mark the cache e 603 rwitm 01110 60x read 000 01010 shd load the block of data into cache mark the cache s 603 rwitm 01110 load the block of data into cache mark the cache e 60x read 000 01010 ar tr y or ar tr y &shd release the bus retry the operation 603 rwitm 01110 s 1 60x (none) (n/a) (n/a) (n/a) no-op me 000 001 i 60x read 001 01010 (none) load the block of data into cache mark the cache e 603 rwitm 01110 60x read 001 01010 shd load the block of data into cache mark the cache s 603 rwitm 01110 load the block of data into cache mark the cache e 60x read 001 01010 ar tr y or ar tr y &shd release the bus retry the operation 603 rwitm 01110 mes 1 60x (none) (n/a) (n/a) (n/a) no-op x1x i 60x (none) x1m (n/a) (n/a) no-op 601 sbr es 1 60x (none) (n/a) (n/a) (n/a) no-op 601 mark cache block i cache retry the operation m 60x (none) (n/a) (n/a) (n/a) no-op 601 wwk x1m 00110 (none) or shd flush the block mark cache block i cache retry the operation 60x (none) (n/a) (n/a) (n/a) (n/a) 601 wwk x1m 00110 ar tr y or ar tr y &shd release the bus retry the operation appendix a. coherency action tables e-21 e.7 dcbz operations table e-8 describes the behavior of the 60x bus interface in response to the execution of a dcbz instruction. 100 i 60x read 100 01010 (none) load the block of data into cache mark cache e 603 rwitm 01110 60x read 100 01010 shd load the block of data into cache mark cache as block s 603 rwitm 01110 (ignore the shared response) load the block of data into cache mark cache e 60x read 100 01010 ar tr y or ar tr y &shd release the bus retry the operation 603 rwitm 01110 mes 1 60x (none) (n/a) (n/a) (n/a) no-op 101 i 60x read 101 01010 (none) load the block of data into cache mark cache block e 603 rwitm 01110 60x read 101 01010 shd load the block of data into cache mark cache block s 603 rwitm 01110 load the block of data into cache mark cache block e 60x read 101 01010 ar tr y or ar tr y &shd release the bus retry the operation 603 rwitm 01110 mes 1 60x (none) (n/a) (n/a) (n/a) no-op note: 1 because it does not implement shared state, these entries are not applicable to the 603. table e-7. coherency actions?cbtst operations (continued) cache processor bus snoop response processor response wim mesi operation wim tt[0?] e-22 powerpc microprocessor family: the bus interface for 32-bit microprocessors table e-8. coherency actions?cbz operations cache processor bus snoop response processor response wim mesi operation wim tt[0?] 000 i 60x kill 000 01100 (none) or shd establish the block in data cache without fetching the block from main memory set all bytes to zero mark cache block m 603 rwitm rwitm, then write zeros instead of data mark cache block m 60x kill 000 01100 ar tr y or ar tr y &shd release the bus retry the operation 603 rwitm s 1 60x kill 000 01100 (none) or shd clear all bytes in the block mark cache block m ar tr y or ar tr y &shd release the bus retry the operation e (none) 000 (n/a) (n/a) clear all bytes in the block mark cache block m m (none) (n/a) (n/a) (n/a) write zeros to all bytes in the cache block 001 i 60x kill 001 01100 (none) or shd establish the block in data cache without fetching the block from main memory set all bytes to zero mark cache block m 603 rwitm rwitm, then write zeros instead of data mark cache block m 60x kill 001 01100 ar tr y or ar tr y &shd release the bus retry the operation 603 rwitm s 1 60x kill 001 01100 (none) or shd mark cache block e set all bytes of the block to zero mark the cache block m ar tr y or ar tr y &shd release the bus retry the operation e (none) (n/a) (n/a) (n/a) write zeros to all bytes in the cache block mark cache block m m (none) (n/a) (n/a) (n/a) write zeros to all bytes in the cache block all others mes 1 i (n/a) (n/a) (n/a) (n/a) (n/a) a dcbz to a cache-inhibited or write- through page causes an alignment exception; this bus transaction cannot occur. note: 1 because it does not implement shared state, these entries are not applicable to the 603. appendix a. coherency action tables e-23 e.8 dcbst operations table e-9 describes the behavior of the 60x bus interface in response to the execution of a dcbst instruction. table e-9. coherency actions?cbst operations cache processor bus snoop response processor response wim mesi operation wim tt[0?] 000 i 60x clean 000 00000 (none) or shd no-op 601 100 601 100 (n/a) (n/a) 60x clean 000 00000 ar tr y or ar tr y &shd release the bus 601 100 603 (none) (n/a) (n/a) (n/a) (n/a) s 1 60x clean 000 00000 (none) or shd no-op 601 100 60x 000 00000 ar tr y or ar tr y &shd release the bus retry the operation 601 100 e 60x clean 000 00000 (none) or shd no-op 601 100 603 (none) (n/a) (n/a) (n/a) 60x clean 000 00000 ar tr y or ar tr y &shd release the bus retry the operation 601 100 603 (none) (n/a) (n/a) (n/a) (n/a) m 60x wwk 100 00110 (none) or shd write the block to main memory mark cache block e 603 000 60x wwk 100 00110 ar tr y or ar tr y &shd release the bus retry the operation 603 000 e-24 powerpc microprocessor family: the bus interface for 32-bit microprocessors 001 i 60x clean 001 00000 (none) or shd no-op 601 101 603 (none) (n/a) (n/a) (n/a) 60x clean 001 00000 ar tr y or ar tr y &shd release the bus retry the operation 601 101 603 (none) (n/a) (n/a) (n/a) (n/a) s 1 60x clean 001 00000 (none) or shd no-op 601 101 60x clean 001 00000 ar tr y or ar tr y &shd release the bus retry the operation 601 101 e 60x clean 001 00000 (none) or shd no-op 601 101 603 (none) (n/a) (n/a) (n/a) 60x clean 001 00000 ar tr y or ar tr y &shd release the bus retry the operation 601 101 603 (none) (n/a) (n/a) (n/a) m 60x wwk 001 00110 (none) or shd write all bytes in the cache block to main memory mark cache block e 601 101 603 001 60x 001 ar tr y or ar tr y &shd release the bus retry the operation 601 101 603 001 x1x i 60x clean w1m 00000 (none) or shd no-op 601 11m 603 (none) (n/a) (n/a) (n/a) 60x clean w1m 00000 ar tr y or ar tr y &shd release the bus retry the operation 601 11m 603 (none) (n/a) (n/a) (n/a) table e-9. coherency actions?cbst operations (continued) cache processor bus snoop response processor response wim mesi operation wim tt[0?] appendix a. coherency action tables e-25 x1x es 1 60x clean w1m 00000 (none) or shd no-op 603 (none) (n/a) (n/a) (n/a) m 60x wwk 100 00110 (none) or shd write all bytes in the cache block to main memory mark cache block e 601 11m flush the block mark cache block i 60x 100 ar tr y or ar tr y &shd release the bus retry the operation 601 11m 100 i 60x clean 100 00000 (none) or shd no-op 603 (none) (n/a) (n/a) (n/a) 60x clean 100 00000 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) s 1 60x clean 100 00000 (none) or shd no-op 60x 100 00000 ar tr y or ar tr y &shd release the bus retry the operation e 60x clean 100 00000 (none) or shd no-op 603 (none) (n/a) (n/a) (n/a) 60x clean 100 00000 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) (n/a) m 60x wwk 100 00110 (none) or shd push the block mark cache block e ar tr y or ar tr y &shd release the bus retry the operation 101 es 1 i 60x clean 101 00000 (none) or shd no-op 603 (none) (n/a) (n/a) (n/a) 60x clean 101 00000 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) (n/a) table e-9. coherency actions?cbst operations (continued) cache processor bus snoop response processor response wim mesi operation wim tt[0?] e-26 powerpc microprocessor family: the bus interface for 32-bit microprocessors 101 m 60x wwk 100 00110 (none) or shd push the block mark cache block e 601 101 60x 100 00110 ar tr y or ar tr y &shd release the bus retry the operation 601 101 note: 1 because it does not implement shared state, these entries are not applicable to the 603. table e-9. coherency actions?cbst operations (continued) cache processor bus snoop response processor response wim mesi operation wim tt[0?] appendix a. coherency action tables e-27 e.9 dcbf operations table e-10 describes the behavior of the 60x bus interface in response to the execution of a dcbf instruction. table e-10. coherency actions?cbf operations cache processor bus snoop response processor response wim mesi operation wim tt[0?] 000 i 60x flush 000 00100 (none) or shd no-op 601 100 603 (none) (n/a) (n/a) (n/a) 60x flush 000 00100 ar tr y or ar tr y &shd release the bus retry the operation 601 100 603 (none) (n/a) (n/a) (n/a) (n/a) s 1 60x flush 000 00100 (none) or shd mark cache block i 601 100 603 (none) (n/a) (n/a) (n/a) 60x flush 000 00100 ar tr y or ar tr y &shd release the bus retry the operation 601 100 603 (none) (n/a) (n/a) (n/a) e 60x flush 000 00100 (none) or shd mark cache block i 601 100 603 (none) (n/a) (n/a) (n/a) 60x flush 000 00100 ar tr y or ar tr y &shd release the bus retry the operation 601 100 603 (none) (n/a) (n/a) (n/a) (n/a) m 60x wwk 100 00110 (none) or shd write the block of data back to main memory mark the cache block i 603 000 60x wwk 100 00110 ar tr y or ar tr y &shd release the bus retry the operation 603 000 e-28 powerpc microprocessor family: the bus interface for 32-bit microprocessors 001 i 60x flush 001 00100 (none) or shd no-op 601 101 603 (none) (n/a) (n/a) (n/a) 60x flush 001 00100 ar tr y or ar tr y &shd release the bus retry the operation 601 101 603 (none) (n/a) (n/a) (n/a) (n/a) s 1 60x flush 001 00100 (none) or shd mark cache block i 601 101 60x 001 ar tr y or ar tr y &shd release the bus retry the operation 601 101 e 60x flush 001 00100 (none) or shd mark cache block i 601 101 603 (none) (n/a) (n/a) (n/a) 60x flush 001 00100 ar tr y or ar tr y &shd release the bus retry the operation 601 101 603 (none) (n/a) (n/a) (n/a) (n/a) m 60x wwk 100 00110 (none) or shd write all bytes in the cache block to main memory mark cache block i 601 101 603 001 60x wwk 100 00110 ar tr y or ar tr y &shd release the bus retry the operation 601 101 603 001 x1x i 60x flush w1m 00100 (none) or shd no-op 601 11m 603 (none) (n/a) (n/a) (n/a) 60x flush w1m 00100 ar tr y or ar tr y &shd release the bus retry the operation 601 11m 603 (none) (n/a) (n/a) (n/a) table e-10. coherency actions?cbf operations (continued) cache processor bus snoop response processor response wim mesi operation wim tt[0?] appendix a. coherency action tables e-29 x1x es 1 60x flush w1m 00100 (none) or shd mark cache block i 601 (none) (n/a) (n/a) (n/a) cache retry the operation 603 mark cache block i 60x flush w1m 00100 ar tr y or ar tr y &shd retry the operation 601 (none) (n/a) (n/a) (n/a) (n/a) 603 m 60x wwk 100 00110 (none) or shd flush the block mark cache block i 601 11m 60x wwk 100 00110 ar tr y or ar tr y &shd release the bus retry the operation 601 11m 100 i 60x flush 100 00100 (none) or shd no-op 603 (none) (n/a) (n/a) (n/a) 60x flush 100 00100 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) (n/a) s 1 60x flush 100 00100 (none) or shd mark cache block i ar tr y or ar tr y &shd release the bus retry the operation e 60x flush 100 00100 (none) or shd mark cache block i 603 (none) (n/a) (n/a) (n/a) 60x flush 100 00100 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) (n/a) m 60x wwk 100 00110 (none) or shd push the block mark cache block i ar tr y or ar tr y &shd release the bus retry the operation table e-10. coherency actions?cbf operations (continued) cache processor bus snoop response processor response wim mesi operation wim tt[0?] e-30 powerpc microprocessor family: the bus interface for 32-bit microprocessors 101 i 60x flush 101 00100 (none) or shd no-op 603 (none) (n/a) (n/a) (n/a) 60x flush 101 00100 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) (n/a) s 1 60x flush 101 00100 (none) or shd mark cache block i ar tr y or ar tr y &shd release the bus retry the operation e 60x flush 101 00100 (none) or shd mark cache block i 603 (none) (n/a) (n/a) (n/a) 60x flush 101 00100 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) (n/a) m 60x wwk 100 00110 (none) or shd flush the block mark cache block i 601 101 60x wwk 100 00110 ar tr y or ar tr y &shd release the bus retry the operation 601 101 note: 1 because it does not implement shared state, these entries are not applicable to the 603. table e-10. coherency actions?cbf operations (continued) cache processor bus snoop response processor response wim mesi operation wim tt[0?] appendix a. coherency action tables e-31 e.10 dcbi operations table e-11 describes the behavior of the 60x bus interface in response to the execution of a dcbi instruction. table e-11. coherency action?cbi operations cache processor bus snoop response processor response wim mesi operation wim tt[0?] 000 i 60x kill 000 01100 (none) or shd no-op 603 (none) (n/a) (n/a) (n/a) 60x kill 000 01100 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) (n/a) s 1 60x kill 000 01100 (none) or shd mark the cache block i ar tr y or ar tr y &shd release the bus retry the operation me 60x kill 000 01100 (none) or shd mark cache block i 603 (none) (n/a) (n/a) (n/a) 60x kill 000 01100 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) (n/a) 001 i 60x kill 001 01100 (none) or shd no-op 603 (none) (n/a) (n/a) (n/a) 60x kill 001 01100 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) (n/a) s 1 60x kill 001 01100 (none) or shd mark cache block i ar tr y or ar tr y &shd release the bus retry the operation me 60x kill 001 01100 (none) or shd mark cache block i 603 (none) (n/a) (n/a) (n/a) 60x kill 001 01100 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) (n/a) e-32 powerpc microprocessor family: the bus interface for 32-bit microprocessors x1x i 60x kill w1m 01100 (none) or shd no-op 603 (none) (n/a) (n/a) (n/a) 60x kill w1m 01100 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) (n/a) s 1 e 60x kill w1m 01100 (none) or shd mark cache block i 603 (none) (n/a) (n/a) (n/a) 601 mark cache block i cache retry the operation 60x kill w1m 01100 ar tr y or ar tr y &shd release the bus retry the operation 601/603 (none) (n/a) (n/a) (n/a) (n/a) m 60x kill w1m 01100 (none) or shd mark cache block i 603 (none) (n/a) (n/a) (n/a) 601 wwk w1m 01100 (none) or shd flush the block mark cache block i cache retry the operation 60x kill w1m 01100 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) (n/a) 601 wwk w1m 01100 ar tr y or ar tr y &shd release the bus retry the operation 100 i 60x kill 100 01100 (none) or shd no-op 603 (none) (n/a) (n/a) (n/a) 60x kill 100 01100 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) (n/a) s 1 60x kill 100 01100 (none) or shd mark cache block i ar tr y or ar tr y &shd release the bus retry the operation me 60x kill 100 01100 (none) or shd mark cache block i 603 (none) (n/a) (n/a) (n/a) 60x kill 100 01100 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) (n/a) table e-11. coherency action?cbi operations (continued) cache processor bus snoop response processor response wim mesi operation wim tt[0?] appendix a. coherency action tables e-33 101 i 60x kill 101 01100 (none) or shd no-op 603 (none) (n/a) (n/a) (n/a) 60x kill 101 01100 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) (n/a) s 1 60x kill 101 01100 (none) or shd mark cache block i 603 (none) (n/a) (n/a) (n/a) (n/a) 60x kill 101 01100 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) (n/a) me 60x kill 101 01100 (none) or shd mark cache block i 603 (none) (n/a) (n/a) (n/a) 60x kill 101 01100 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) (n/a) note: 1 because it does not implement shared state, these entries are not applicable to the 603. table e-11. coherency action?cbi operations (continued) cache processor bus snoop response processor response wim mesi operation wim tt[0?] e-34 powerpc microprocessor family: the bus interface for 32-bit microprocessors e.11 icbi operations table e-12 describes the behavior of the 60x bus interface in response to the execution of an icbi instruction. table e-12. coherency actions?cbi operations cache processor bus snoop response processor response wim mesi operation wim tt[0?] 000 i 60x icbi 000 01101 (none) or shd no-op 603 (none) (n/a) (n/a) (n/a) 601 (n/a) 60x icbi 000 01101 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) (n/a) 601 (n/a) val 60x icbi 000 01101 (none) or shd mark i-cache block i 603 (none) (n/a) (n/a) (n/a) 601 (n/a) no-op (uni?d cache) 60x icbi 000 01101 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) 601 (n/a) (n/a) 001 i 60x icbi 001 01101 (none) or shd no-op 603 (none) (n/a) (n/a) (n/a) 601 (n/a) 60x icbi 001 01101 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) (n/a) 601 (n/a) val 60x icbi 001 01101 (none) or shd mark i-cache block i 603 (none) (n/a) (n/a) (n/a) 601 (n/a) no-op (uni?d cache) 60x icbi 001 01101 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) 601 (n/a) (n/a) appendix a. coherency action tables e-35 x1x i 60x icbi x1m 01101 (none) or shd no-op 603 (none) (n/a) (n/a) (n/a) 601 (n/a) no-op (uni?d cache) 60x icbi x1m 01101 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) (n/a) 601 (n/a) val 60x icbi x1m 01101 (none) or shd mark i-cache block i 603 (none) (n/a) (n/a) (n/a) 601 (n/a) no-op (uni?d cache) 60x icbi x1m 01101 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) (n/a) 601 (n/a) 100 i 60x icbi 100 01101 (none) or shd no-op 603 (none) (n/a) (n/a) (n/a) 601 (n/a) 60x icbi 100 01101 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) (n/a) 601 (n/a) val 60x icbi 100 01101 (none) or shd mark i-cache block i 603 (none) (n/a) (n/a) (n/a) 601 (n/a) no-op (uni?d cache) 60x icbi 100 01101 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) 601 (n/a) (n/a) table e-12. coherency actions?cbi operations (continued) cache processor bus snoop response processor response wim mesi operation wim tt[0?] e-36 powerpc microprocessor family: the bus interface for 32-bit microprocessors e.12 sync operations table e-13 describes the behavior of the 60x bus interface in response to the execution of a sync instruction. this table does not fully describe the operation of the sync instruction. 101 i 60x icbi 101 01101 (none) or shd no-op 603 (none) (n/a) (n/a) (n/a) 601 (n/a) 60x icbi 101 01101 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) (n/a) 601 (n/a) val 60x icbi 101 01101 (none) or shd mark i-cache block i 603 (none) (n/a) (n/a) (n/a) 601 (n/a) no-op (uni?d cache) 60x icbi 101 01101 ar tr y or ar tr y &shd release the bus retry the operation 603 (none) (n/a) (n/a) (n/a) 601 (n/a) (n/a) table e-13. coherency actions?ync operations cache bus snoop response processor response wim mesi wim tt[0?] (n/a) (n/a) xx1 01000 (none) or shd the sync instruction has completed ar tr y or ar tr y &shd release the bus retry the operation table e-12. coherency actions?cbi operations (continued) cache processor bus snoop response processor response wim mesi operation wim tt[0?] appendix a. coherency action tables e-37 e.13 eieio operations table e-14 describes the behavior of the 60x bus interface in response to the execution of an eieio instruction.this table does not fully describe the operation of the eieio instruction. e.14 tlbie operations table e-15 describes the behavior of the 60x bus interface in response to the execution of a tlbie instruction. this table does not fully describe the operation of the tlbie instruction. e.15 tlbsync operations table e-16 describes the behavior of the 60x bus interface in response to the execution of a tlbsync instruction. this table does not fully describe the operation of the tlbsync instruction. table e-14. coherency actions?ieio operations cache bus snoop response processor response wim mesi wim tt[0?] (n/a) (n/a) xx1 10000 (none) or shd the eieio instruction completed. ar tr y or ar tr y &shd release the bus retry the operation table e-15. coherency actions?lbie operations cache bus snoop response processor response wim mesi wim tt[0?] (n/a) (n/a) xx1 11000 (none) or shd hold off any new memory instructions wait for completion of any outstanding memory instructions invalidate the requested tlb entry ar tr y or ar tr y &shd release the bus retry the operation table e-16. coherency actions?lbsync operations cache bus snoop response processor response wim mesi wim mesi (n/a) (n/a) xx1 01001 (none) or shd the tlbsync instruction has completed 01001 ar tr y or ar tr y &shd release the bus retry the operation e-38 powerpc microprocessor family: the bus interface for 32-bit microprocessors e.16 snoop-kill operations table e-17 describes the behavior of the 60x bus interface in response to a snoop-kill bus operation. table e-17. coherency actions?noop-kill operations cache processor bus reservation snoop response processor response wim mesi wim tt[0?] (n/a) i 60x xx1 01100 none (none) no-op 60x yes (and reset) release reservation 603 no-op s 1 em 60x none mark cache block i 603 no-op 60x yes (and reset) mark cache block i release reservation 603 no-op m 604 yes (and reset) ar tr y &shd try to write cache block back to main memory if successful, mark cache block i note: 1 because it does not implement shared state, these entries are not applicable to the 603. appendix a. coherency action tables e-39 e.17 snoop-read operations table e-18 describes the behavior of the 60x bus interface in response to a snoop-read bus operation. table e-18. coherency actions?noop-read operations cache proc. bus reservation snoop response processor response wim mesi wim tt[0?] (n/a) i 60x x11 01010 none (none) no-op 60x yes shd 603 (none) s 1 60x (n/a) shd no-op e 60x shd mark cache block s 603 (none) mark cache block i m 60x x01 ar tr y or ar tr y &shd try to write cache block back to main memory if successful, mark cache block s 603 try to write cache block back to main memory if successful, mark cache block i 60x x11 try to write cache block back to main memory if successful, mark cache block s 603 try to write cache block back to main memory if successful, mark cache block e note: 1 because it does not implement shared state, these entries are not applicable to the 603. e-40 powerpc microprocessor family: the bus interface for 32-bit microprocessors e.18 snoop-read-atomic operations table e-19 describes the behavior of the 60x bus interface in response to a snoop-read atomic bus operation. table e-19. coherency actions?noop-read atomic operations cache proc. bus res. snoop response processor response wim mesi wim tt[0?] (n/a) i 60x xx1 11010 none (none) no-op 60x yes shd 603 none s 1 60x none shd no-op e 60x (n/a) shd mark cache block s 603 none mark cache block i m 60x x01 ar tr y &shd try to write cache block back to main memory if successful, mark cache block s 603 try to write cache block back to main memory if successful, mark cache block i 60x x11 try to write cache block back to main memory if successful, mark cache block s 603 try to write cache block back to main memory if successful, mark cache block e note: 1 because it does not implement shared state, these entries are not applicable to the 603. appendix a. coherency action tables e-41 e.19 snoop-rwitm operations table e-20 describes the behavior of the 60x bus interface in response to a snoop-rwitm bus operation. e.20 snoop-rwitm-atomic operations table e-21 describes the behavior of the 60x bus interface in response to a snoop-snoop- rwitm atomic bus operation. table e-20. coherency actions?noop-rwitm operations cache bus reservation snoop response processor response wim mesi wim tt[0?] (n/a) i xx1 01110 none (none) no-op yes (and reset) release reservation s 1 e none mark cache block i yes (and reset) mark cache block i release reservation m none ar tr y &shd try to write cache block back to main memory if successful, mark cache block i yes (and reset) try to write cache block back to main memory if successful: mark cache block i release reservation note: 1 because it does not implement shared state, these entries are not applicable to the 603. table e-21. coherency actions?noop-rwitm atomic operations cache bus reservation snoop response processor response wim mesi wim tt[0?] (n/a) i xx1 11110 none (none) no-op yes (and reset) release reservation s 1 e none mark cache block i yes (and reset) mark cache block i release reservation m none ar tr y &shd try to write cache block back to main memory if successful, mark cache block i yes (and reset) try to write cache block back to main memory if successful: mark cache block i release reservation note: 1 because it does not implement shared state, these entries are not applicable to the 603. e-42 powerpc microprocessor family: the bus interface for 32-bit microprocessors e.21 snoop-flush operations table e-22 describes the behavior of the 60x bus interface in response to a snoop-snoop- ?sh bus operation. e.22 snoop-clean operations table e-23 describes the behavior of the 60x bus interface in response to a snoop-clean bus operation. table e-22. coherency actions?noop-flush operations cache proc. bus reservation snoop response processor response wim mesi wim tt[0?] (n/a) i 60x xx1 00100 none (none) no-op yes (none) no-op. snoop-?sh operation cannot clear the reservation. s 1 e 60x (n/a) (none) mark cache block i 603 no-op m 60x ar tr y &shd try to write cache block back to main memory if successful, mark cache block i 603 (none) no-op note: 1 because it does not implement shared state, these entries are not applicable to the 603. table e-23. coherency actions?noop-clean cache processor bus snoop response processor response wim mesi wim tt[0?] (n/a) es 1 i 60x xx1 00000 (none) no-op m 60x xx1 ar tr y &shd try to write cache block back to main memory if successful, mark cache block e 603 (none) no-op note: 1 because it does not implement shared state, these entries are not applicable to the 603. appendix a. coherency action tables e-43 e.23 snoop-write-with-flush operations table e-24 describes the behavior of the 60x bus interface in response to a snoop-write with ?sh bus operation. table e-24. coherency actions?noop-write-with-flush operations cache bus reservation snoop response processor response wim mesi wim tt[0?] (n/a) i xx1 00010 none (none) no-op yes (and reset) release reservation s 1 none mark cache block i yes (and reset) mark cache block i release reservation e none paradox 2 ?o one else should be writing if this cache is e mark cache block i yes (and reset) paradox 2 ?o one else should be writing if this cache is e mark cache block i release reservation m none ar tr y &shd paradox 2 ?o one else should be writing if this cache is m try to write cache block back to main memory if successful, mark cache block i yes (and reset) paradox 2 ?o one else should be writing if this cache is m try to write cache block back to main memory if successful: mark cache block i release reservation notes: 1 because it does not implement shared state, these entries are not applicable to the 603. 2 a coherency paradox to the processor may cause incoherent data to appear in the system. that is, there is a potential for data integrity errors in the system. e-44 powerpc microprocessor family: the bus interface for 32-bit microprocessors e.24 snoop-write-with-kill operations table e-25 describes the behavior of the 60x bus interface in response to a snoop-write- with-kill bus operation. table e-25. coherency actions?noop-write-with-kill operations cache bus reservation snoop response processor response wim mesi wim tt[0?] (n/a) i xx1 00110 none (none) no-op yes (and reset) release reservation s 1 none mark cache block i yes (and reset) mark cache block i release reservation e none paradox 2 ?o one else should be writing if this cache is e mark cache block i yes (and reset) paradox 2 ?o one else should be writing if this cache is e mark cache block i release reservation m none paradox 2 ?o one else should be writing if this cache is m mark cache block i yes (and reset) paradox 2 ?o one else should be writing if this cache is m mark cache block i release reservation notes: 1 because it does not implement shared state, these entries are not applicable to the 603. 2 a coherency paradox to the processor may cause incoherent data to appear in the system. that is, there is a potential for data integrity errors in the system. appendix a. coherency action tables e-45 e.25 snoop-write-with-flush-atomic operations table e-26 describes the behavior of the 60x bus interface in response to a snoop-write- with-?sh-atomic bus operation. table e-26. coherency actions?noop-write-with-flush-atomic operations cache bus reservation snoop response processor response wim mesi wim tt[0?] (n/a) i xx1 00110 none (none) no-op yes (and reset) release reservation s 1 none mark cache block i yes (and reset) mark cache block i release reservation e none paradox 2 ?o one else should be writing if this cache is e mark cache block i yes (and reset) paradox 2 ?o one else should be writing if this cache is e mark cache block i release reservation m none ar tr y &shd paradox 2 ?o one else should be writing if this cache is m try to write block back to main memory if successful, mark cache block i yes (and reset) paradox 2 ?o one else should be writing if this cache is m try to write block back to main memory if successful: mark cache block i release reservation notes: 1 because it does not implement shared state, these entries are not applicable to the 603. 2 a coherency paradox to the processor may cause incoherent data to appear in the system. that is, there is a potential for data integrity errors in the system. e-46 powerpc microprocessor family: the bus interface for 32-bit microprocessors e.26 snoop-tlb-invalidate operations table e-27 describes the behavior of the 60x bus interface in response to a snoop-tlb- invalidate bus operation. e.27 snoop-sync operations table e-28 describes the behavior of the 60x bus interface in response to a snoop-sync bus operation. e.28 snoop-eieio operations table e-29 describes the behavior of the 60x bus interface in response to a snoop-eieio bus operation. table e-27. coherency actions?noop-tlb-invalidate operations cache bus snoop response processor response wim mesi wim tt[0?] (n/a) (n/a) xx1 11000 (none) respond with (none) when all previous tlb invalidates have been performed (none) but ar tr y is activated on the bus from another processor do not perform the tlb invalidate?his is to prevent a deadlock condition from occurring ar tr y respond with retry until tlb has been invalidated. previous tlb invalidate is still in progress. table e-28. coherency actions?noop-sync operations cache bus snoop response processor response wim mesi wim tt[0?] (n/a) (n/a) xx1 01000 (none) if there are no tlb invalidates pending that were initiated by this processor, no-op ar tr y if there is a tlb invalidate pending that was initiated by this processor, respond with retry. or if there is a snoop push address tenure (write/w/kill) pending due to previous snoop. table e-29. coherency actions?noop-eieio operations cache bus snoop response processor response wim mesi wim tt[0?] (n/a) (n/a) xx1 10000 (none) no-op. the 604 family never asserts ar tr y for an eieio snoop. ar tr y appendix a. coherency action tables e-47 e.29 snoop-tlbsync operations table e-30 describes the behavior of the 60x bus interface in response to a snoop- tlbsync bus operation. e.30 snoop-icbi operations table e-31 describes the behavior of the 60x bus interface in response to a snoop-icbi bus operation. table e-30. coherency actions?noop-tlbsync operations cache bus snoop response processor response wim mesi wim tt[0?] (n/a) (n/a) xx1 01001 (none) if no tlb invalidates are pending and there are no marked transactions, no-op. note that all queues in the processor with translated addresses are considered marked whenever a tlbi snoop operation completes. these transactions may hit in the cache internally and clear the mark. other times, these transactions must complete a 60x bus address tenure before these marks can be cleared. tlbsync snoops are ar tr y d until all marks are cleared. ar tr y if a tlb invalidate is pending or if any marked transactions are pending, respond with retry table e-31. coherency actions?noop-icbi operations cache bus snoop response processor response wim mesi wim tt[0?] (n/a) i xx1 011001 (none) no-op val invalidate entry in i-cache e-48 powerpc microprocessor family: the bus interface for 32-bit microprocessors e.31 snoop-rwnitc operations table e-32 describes the behavior of the 60x bus interface in response to a snoop-rwnitc bus operation. table e-32. coherency actions?noop-rwnitc operations cache processor bus reservation snoop response processor response wim mesi wim tt[0?] (n/a) i 60x xx1 01011 none (none) no-op 60x yes shd no-op 603 (n/a) (n/a) s 1 60x (n/a) shd no-op e 60x shd no-op 603 (none) mark cache block e m 601 ar tr y &shd try to write cache block back to main memory if successful, mark cache block s 604 try to write cache block back to main memory if successful, mark cache block e 603 ar tr y no-op note: 1 because it does not implement shared state, these entries are not applicable to the 603. glossary of terms and abbreviations glossary-1 glossary of terms and abbreviations the glossary contains an alphabetical list of terms, phrases, and abbreviations used in this book. some of the terms and de?itions included in the glossary are reprinted from ieee std 754-1985, ieee standard for binary floating-point arithmetic , copyright ?985 by the institute of electrical and electronics engineers, inc. with the permission of the ieee. architecture. a detailed speci?ation of requirements for a processor or computer system. it does not specify details of how the processor or computer system must be implemented; instead it provides a template for a family of compatible implementations . asynchronous exception . exceptions that are caused by events external to the processor s execution. in this document, the term ?synchronous exception is used interchangeably with the word interrupt . atomic access . a bus access that attempts to be part of a read-write operation to the same address uninterrupted by any other access to that address (the term refers to the fact that the transactions are indivisible). the 60x processors implement atomic accesses through the lwarx / stwcx. instruction pair. bat (block address translation) mechanism . a software-controlled array that stores the available block address translations on-chip. beat . a single state on the 60x bus interface that may extend across multiple bus cycles. a 60x transaction can be composed of multiple address or data beats. biased exponent . an exponent whose range of values is shifted by a constant (bias). typically a bias is provided to allow a range of positive values to express a range that includes both positive and negative values. big-endian . a byte-ordering method in memory where the address n of a word corresponds to the most-signi?ant byte . in an addressed memory word, the bytes are ordered (left to right) 0, 1, 2, 3, with 0 being the most-signi?ant byte. a b glossary-2 powerpc microprocessor family: the bus interface for 32-bit microprocessors block . an area of memory that ranges from 128 kbyte to 256 mbyte, whose size, translation, and protection attributes are controlled by the bat mechanism. boundedly unde?ed . a characteristic of results of certain operations that are not rigidly prescribed by the powerpc architecture. boundedly- unde?ed results for a given operation may vary among implementations, and between execution attempts in the same implementation. although the architecture does not prescribe the exact behavior for when results are allowed to be boundedly unde?ed, the results of executing instructions in contexts where results are allowed to be boundedly unde?ed are constrained to ones that could have been achieved by executing an arbitrary sequence of de?ed instructions, in valid form, starting in the state the machine was in before attempting to execute the given instruction. burst . a multiple beat data transfer whose total size is typically equal to a cache block. bus clock . clock that causes the bus state transitions. bus master . the owner of the address or data bus; the device that initiates or requests the transaction. cache . high-speed memory component containing recently-accessed data and/or instructions (subset of main memory). cache block . a small region of contiguous memory that is copied from memory into a cache . the size of a cache block may vary among processors; the maximum block size is one page . in powerpc processors, cache coherency is maintained on a cache-block basis. note that the term ?ache block is often used interchangeably with ?ache line? cache coherency . an attribute wherein an accurate and common view of memory is provided to all devices that share the same memory system. caches are coherent if a processor performing a read from its cache is supplied with data corresponding to the most recent value written to memory or to another processor s cache. c glossary of terms and abbreviations glossary-3 cache ?sh . an operation that removes from a cache any data from a speci?d address range. this operation ensures that any modi?d data within the speci?d address range is written back to main memory. this operation is generated typically by a data cache block flush ( dcbf ) instruction. caching-inhibited . a memory update policy in which the cache is bypassed and the load or store is performed to or from main memory. cast-outs . cache blocks that must be written to memory when a cache miss causes a cache block to be replaced. clear . to cause a bit or bit ?ld to register a value of zero. see also set. context synchronization . an operation that ensures that all instructions in execution complete past the point where they can produce an exception , that all instructions in execution complete in the context in which they began execution, and that all subsequent instructions are fetched and executed in the new context. context synchronization may result from executing speci? instructions (such as isync or r ) or when certain events occur (such as an exception). copy-back . an operation in which modi?d data in a cache block is copied back to memory. denormalized number . a nonzero ?ating-point number whose exponent has a reserved value, usually the format's minimum, and whose explicit or implicit leading signi?and bit is zero. direct-mapped cache . a cache in which each main memory address can appear in only one location within the cache, operates more quickly when the memory request is a cache hit. direct-store. interface available on powerpc processors only to support direct-store devices from the power architecture. when the t bit of a segment descriptor is set, the descriptor de?es the region of memory that is to be used as a direct-store segment. note that this facility is being phased out of the architecture and will not likely be supported in future devices. therefore, software should not depend on it and new software should not use it. d glossary-4 powerpc microprocessor family: the bus interface for 32-bit microprocessors effective address (ea) . the 32- or 64-bit address speci?d for a load, store, or an instruction fetch. this address is then submitted to the mmu for translation to either a physical memory address or an i/o address. e xception . a condition encountered by the processor that requires special, supervisor-level processing. exception handler . a software routine that executes when an exception is taken. normally, the exception handler corrects the condition that caused the exception, or performs some other meaningful task (that may include aborting the program that caused the exception). the address for each exception handler is identi?d by an exception vector offset de?ed by the architecture and a pre? selected via the msr. exclusive state. mesi state (e) in which only one caching device contains data that is also in system memory. execution synchronization . a mechanism by which all instructions in execution are architecturally complete before beginning execution (appearing to begin execution) of the next instruction. similar to context synchronization but doesn't force the contents of the instruction buffers to be deleted and refetched. exponent . in the binary representation of a ?ating-point number, the exponent is the component that normally signi?s the integer power to which the value two is raised in determining the value of the represented number. see also biased exponent. feed-forwarding . a feature that reduces the number of clock cycles that an execution unit must wait to use a register. when the source register of the current instruction is the same as the destination register of the previous instruction, the result of the previous instruction is routed to the current instruction at the same time that it is written to the register ?e. with feed-forwarding, the destination bus is gated to the waiting execution unit over the appropriate source bus, saving the cycles which would be used for the write and read. fetch . retrieving instructions from either the cache or main memory and placing them into the instruction queue. e f glossary of terms and abbreviations glossary-5 floating-point register (gpr) . any of the 32 registers in the ?ating-point register ?e. these registers provide the source operands and destination results for all ?ating-point data manipulation instructions. floating-point load instructions move data from memory to registers, and ?ating-point store instructions move data from registers to memory. flush . an operation that causes a modi?d cache block to be invalidated and the data to be written to memory. fraction . in the binary representation of a ?ating-point number, the ?ld of the signi?and that lies to the right of its implied binary point. general-purpose register (gpr) . any of the 32 registers in the general purpose register ?e. these registers provide the source operands and destination results for all integer data manipulation instructions. load instructions move data from memory to registers, and store instructions move data from registers to memory. harvard architecture . an architectural model featuring separate caches for instruction and data. ieee 754 . a standard written by the institute of electrical and electronics engineers that de?es operations and representations of binary ?ating-point arithmetic. implementation . a particular processor that conforms to the powerpc architecture, but may differ from other architecture-compliant implementations for example in design, feature set, and implementation of optional features. the powerpc architecture has many different implementations. implementation-dependent . an aspect of a feature in a processor s design that is de?ed by a processor s design speci?ations rather than by the powerpc architecture. implementation-speci? . an aspect of a feature in a processor s design that is not required by the powerpc architecture, but for which the powerpc architecture may provide concessions to ensure that processors that implement the feature do so consistently. g h h i glossary-6 powerpc microprocessor family: the bus interface for 32-bit microprocessors imprecise exception . a type of synchronous exception that is allowed not to adhere to the precise exception model ( see precise exception). the powerpc architecture allows only ?ating-point exceptions to be handled imprecisely. inexact . loss of accuracy in an arithmetic operation when the rounded result differs from the in?itely precise value with unbounded range. in-order. an aspect of an operation that adheres to a sequential model. an operation is said to be performed in-order if, at the time that it is performed, it is known to be required by the sequential execution model. see out-of-order. instruction queue . a holding place for instructions fetched from the current instruction stream. interrupt . an external signal that causes the processor to suspend current execution and take a prede?ed exception. invalid state . state of a cache entry that does not currently contain a valid copy of a cache block from memory. key bits . a set of key bits referred to as ks and kp in each segment register and each bat register. the key bits determine whether supervisor or user programs can access a page within that segment or block . kill . an operation that causes a cache block to be invalidated. latency . the number of clock cycles necessary to perform an action, such as a memory access. least-signi?ant bit (lsb) . the bit of least value in an address, register, data element, or instruction encoding. least-signi?ant byte (lsb) . the byte of least value in an address, register, data element, or instruction encoding. little-endian . a byte-ordering method in memory where the address n of a word corresponds to the least-signi?ant byte . in an addressed memory word, the bytes are ordered (left to right) 3, 2, 1, 0, with 3 being the most-signi?ant byte . see big-endian. livelock . a state in which processors interact in a way such that no processor makes progress. k l glossary of terms and abbreviations glossary-7 memory-mapped accesses . accesses whose addresses use the segmented or block address translation mechanisms provided by the mmu and that occur externally with the bus protocol de?ed for memory. memory coherency . refers to memory agreement between caches and system memory (for example, mesi cache coherency). memory consistency . refers to levels of memory with respect to a single processor and system memory (for example, on-chip cache, secondary cache, and system memory). memory management unit . the functional unit that is capable of translating an effective (logical) address to a physical address, providing protection mechanisms, and de?ing caching methods. mesi (modi?d/exclusive/shared/invalid) . cache coherency protocol used to manage caches on different devices that share a memory system. note that the powerpc architecture does not specify the implementation of a mesi protocol to ensure cache coherency. modi?d state . when a cache block is in the modi?d state, it has been modi?d by the processor since it was copied from memory. see mesi. multiprocessing . the capability of software, especially operating systems, to support execution on more than one processor at the same time. most-signi?ant bit (msb) . the highest-order bit in an address, registers, data element, or instruction encoding. most-signi?ant byte (msb) . the highest-order byte in an address, registers, data element, or instruction encoding. oea (operating environment architecture) . the level of the architecture that describes powerpc memory management model, supervisor- level registers, synchronization requirements, and the exception model. it also de?es the time-base feature from a supervisor-level perspective. implementations that conform to the powerpc oea also conform to the powerpc uisa and vea. optional . a feature, such as an instruction, a register, or an exception, that is de?ed by the powerpc architecture but not required to be implemented. m o glossary-8 powerpc microprocessor family: the bus interface for 32-bit microprocessors out-of-order . an aspect of an operation that allows it to be performed ahead of one that may have preceded it in the sequential model, for example, speculative operations. an operation is said to be performed out-of-order if, at the time that it is performed, it is not known to be required by the sequential execution model. see in-order. over?w . an error condition that occurs during arithmetic operations when the result cannot be stored accurately in the destination register(s). for example, if two 32-bit numbers are multiplied, the result may not be representable in 32 bits. packet . a term used with respect to direct-store operations. page . a region in memory. the oea de?es a page as a 4-kbyte area of memory, aligned on a 4-kbyte boundary. page table entry (pte) . data structures containing information used to translate effective address to physical address on a 4-kbyte page basis. a pte consists of 8 bytes of information in a 32-bit processor. park . the act of allowing a bus master to maintain mastership of the bus without having to arbitrate. pipelining . a technique that breaks operations, such as instruction processing or bus transactions, into smaller distinct stages or tenures (respectively) so that a subsequent operation can begin before the previous one has completed. precise exceptions . the pipeline can be stopped so the instructions that preceded the faulting instruction can complete, and subsequent instructions can be executed following the execution of the exception handler. the system is precise unless one of the imprecise modes for invoking the ?ating-point enabled exception is in effect. physical memory . the actual memory that can be accessed through the system s memory bus. p glossary of terms and abbreviations glossary-9 quad word . a group of 16 contiguous locations starting at an address divisible by 16. quiesce . to come to rest. the processor is said to quiesce when an exception is taken or a sync instruction is executed. the instruction stream is stopped at the decode stage and executing instructions are allowed to complete to create a controlled context for instructions that may be affected by out-of-order, parallel execution. see context synchronization. reservation . the processor establishes a reservation on a cache block of memory space when it executes an lwarx instruction to read a memory semaphore into a gpr. reserved ?ld. in a register, a reserved ?ld is one that is not assigned a function. a reserved ?ld may be a single bit. the handling of reserved bits is implementation-dependent . software is permitted to write any value to such a bit. a subsequent reading of the bit returns 0 if the value last written to the bit was 0 and returns an unde?ed value (0 or 1) otherwise. risc (reduced instruction set computing) . an architecture characterized by ?ed-length instructions with nonoverlapping functionality and by a separate set of load and store instructions that perform memory accesses. segment . a 256-mbyte area of virtual memory that is the most basic memory space de?ed by the powerpc architecture. each segment is con?ured through a unique segment descriptor . segment descriptors . information used to generate the interim virtual address . the segment descriptors reside in 16 on-chip segment registers for 32-bit implementations. set ( v ). to write a nonzero value to a bit or bit ?ld; the opposite of clear . the term ?et may also be used to generally describe the updating of a bit or bit ?ld. set ( n ). a subdivision of a cache . cacheable data can be stored in a given location in any one of the sets, typically corresponding to its lower- order address bits. because several memory locations can map to the same location, cached data is typically placed in the set whose cache block corresponding to that address was used least recently. see set-associativity. q r s glossary-10 powerpc microprocessor family: the bus interface for 32-bit microprocessors set-associativity . aspect of cache organization in which the cache space is divided into sections, called sets . the cache controller associates a particular main memory address with the contents of a particular set, or region, within the cache. signi?and . the component of a binary ?ating-point number that consists of an explicit or implicit leading bit to the left of its implied binary point and a fraction ?ld to the right. slave . the device addressed by a master device. the slave is identi?d in the address tenure and is responsible for supplying or latching the requested data for the master during the data tenure. snooping . monitoring addresses driven by a bus master to detect the need for coherency actions. snoop push . write-backs due to a snoop hit. the block will transition to an invalid or exclusive state. split - transaction . a transaction with independent request and response tenures. split-transaction bus . a bus that allows address and data transactions from different processors to occur independently. strong ordering . a memory access model that requires exclusive access to an address before making an update, to prevent another device from using stale data. supervisor mode . the privileged operation state. in supervisor mode, software can access all control registers and can access the supervisor memory space, among other privileged operations. synchronization. a process to ensure that operations occur strictly in order . see context synchronization and execution synchronization. synchronous exception. an exception that is generated by the execution of a particular instruction or instruction sequence. there are two types of synchronous exceptions, precise and imprecise . system memory. the physical memory available to a processor. glossary of terms and abbreviations glossary-11 tlb (translation lookaside buffer) a cache that holds recently-used page table entries . tenure . the period of bus mastership. there can be separate address bus tenures and data bus tenures. a tenure consists of three phases: arbitration, transfer, termination. throughput . the measure of the number of instructions that are processed per clock cycle. transaction . a complete exchange between two bus devices. a transaction is minimally comprised of an address tenure; one or more data tenures may be involved in the exchange. there are two kinds of transactions: address/data and address-only. transfer termination . signal that refers to both signals that acknowledge the transfer of individual beats (of both single-beat transfer and individual beats of a burst transfer) and to signals that mark the end of the tenure. uisa (user instruction set architecture) . the level of the architecture to which user-level software should conform. the uisa de?es the base user-level instruction set, user-level registers, data types, ?ating-point memory conventions and exception model as seen by user programs, and the memory and programming models. uni?d cache . combined data and instruction cache. user mode . the unprivileged operating state of a processor used typically by application software. in user mode, software can only access certain control registers and can access only user memory space. no privileged operations can be performed. also referred to as problem state. vea (virtual environment architecture) . the level of the architecture that describes the memory model for an environment in which multiple devices can access memory, de?es aspects of the cache model, de?es cache control instructions, and de?es the time-base facility from a user-level perspective. implementations that conform to the powerpc vea also adhere to the uisa, but may not necessarily adhere to the oea. virtual address . an intermediate address used in the translation of an effective address to a physical address. t u v glossary-12 powerpc microprocessor family: the bus interface for 32-bit microprocessors weak ordering . a memory access model that allows bus operations to be reordered dynamically, which improves overall performance and in particular reduces the effect of memory latency on instruction throughput. word . a 32-bit data element. write-back . a cache memory update policy in which processor write cycles are directly written only to the cache. external memory is updated only indirectly, for example, when a modi?d cache block is cast out to make room for newer data. write-through . a cache memory update policy in which all processor write cycles are written to both the cache and memory. v w index index-1 index numerics 601, see powerpc 601 603, see powerpc 603 604, see powerpc 604 60x bus arbitration, 8-1 block diagram, 1-3 bus operations, 4-15 bus/memory coherency summary, a-1 definition, 1-1 features, 1-3 general description, xvi implementation differences, summary, 4-19 overview, 1-1 processor-initiated operations, 4-12 signals, overview, 1-4 snooping, 4-14 system design considerations, 8-1 upgrade suggestions, c-1 a aa ck (address acknowledge) signal, 2-17 , 8-4 abb (address bus busy) signals, 2-3 acronyms/abbreviations, list, xxi address bus address bus parity, 3-9 address transfer signals, 3-8 , 3-9 address transfer termination, 3-17 arbitration, 3-6 arbitration signals, 2-2 , 3-4 tenure, 3-6 address pipelining, 3-5 address transfer signals, 3-8 , 3-9 alignment aligned data transfers 32-bit bus, 3-12 64-bit bus, 3-11 effect in data transfers, 3-10 external control instructions, 3-17 misaligned data transfers 601, 3-13 603, 32-bit mode, 3-16 603/604, 3-14 a n (address bus) signals, 2-6 ape (address parity error) signal, 2-8 ap n (address bus parity) signals, 2-7 arbitration description, 8-1 signals, 3-4 ar tr y (address retry) signal, 2-18 b basic transfer protocol, 1-2 bg (bus grant) signal, 2-2 block diagram, 1-3 br (bus request) signal, 2-2 burst ordering, 3-10 bus arbitration signals, 3-4 bus operations additional bus configurations, 6-1 summary, 6-1 clean block, 4-15 coherency actions, e-1 description, 4-14 eieio, 4-17 flush block, 4-15 icbi, 4-18 implementation differences, 4-19 improved bus performance features, 8-5 kill block, 4-15 non-canceling bus operations, 8-8 processor summary, a-1 read, 4-16 read atomic, 4-16 rwitm (read with intent to modify), 4-16 rwnitc (read with no intent to cache), 4-18 sync, 4-17 , 8-4 sync vs tlbsync, system design, 8-4 tlb invalidate, 4-16 tlbie, 4-14 tlbsync, 4-17 , 8-4 write with flush, 4-15 write with flush atomic, 4-15 write with kill, 4-16 xferdata, 4-18 bus protocol, 3-2 bus transactions, see bus operations index-2 powerpc microprocessor family: the bus interface for 32-bit microprocessors index c cache cache coherency overview, 4-5 cache control instructions, 4-12 l2 considerations (604), d-1 overview, implementations, 4-1 ci (cache inhibit) signal, 2-15 ckstp_in (checkstop input) signal, 2-28 ckstp_out (checkstop output) signal, 2-28 clocking, overview, b-1 coherency actions, e-1 conventions, general, xx cse n (cache set element) signals, 2-17 , 4-11 d data bus alignment aligned data transfers 32-bit bus, 3-12 64-bit bus, 3-11 burst ordering during data transfers, 3-10 effect of alignment in data transfers, 3-10 misaligned data transfers 601, 3-13 603, 32-bit mode, 3-16 603/604, 3-14 arbitration data bus, 3-19 effect of ar tr y assertion (604), 3-20 signals, 3-4 burst ordering, 3-10 data bus tenure, 3-19 data transfer termination, bus error, 3-26 effect of ar tr y assertion (604), 3-20 dbb (data bus busy) signal, 2-21 , 3-21 dbdis (data bus disable) signal, 2-24 dbg (data bus grant) signal, 2-20 dbw o (data bus write only) signal, 2-21 , 3-22 , 8-1 dh n /dl n (data bus) signals, 2-22 direct-memory access, description, 4-19 direct-store interface direct-store operations, 1-2 , 7-6 load operations, 7-7 memory-forced direct store interface (601), 7-9 overview, 7-1 store operations, 7-7 timing diagrams, 7-8 tranaction protocol details, 7-2 dpe (data parity error) signal, 2-24 dp n (data bus parity) signals, 2-23 dr tr y (data retry) signal, 2-25 e eciwx/ecowx, alignment, 3-17 exceptions checkstops, 5-7 external interrupt, 5-14 machine check, 5-7 system management interrupt, 5-16 system reset, 5-5 g gbl (global) signal, 2-16 h halted signal, 2-32 hid0 (checkstop sources and enables) register (601), 5-10 hp_snp_req (high-priority snoop request) signal (601 only), 2-17 hreset (hard reset) signal, 2-29 , 5-2 i ieee 1149.1 interface, 8-5 instructions cache control instructions, 4-12 eciwx/ecowx, alignment, 3-17 tlbie processing, 4-14 int (interrupt) signal, 2-27 l l2_int (external cache intervention) signal, 2-30 lwarx/stwcx. address-only operation, 8-10 considerations, 8-6 implementation (603), 4-11 m mcp (machine check interrupt) signal, 2-28 memory access protocol, 3-2 memory coherency actions, 60x-initiated operations, 4-12 coherency actions, e-1 description, 4-1 , 4-19 mesi protocol, 4-6 processor summary, a-1 protocol, 4-9 timing, 4-9 index index-3 index p powerpc 601 cache organization, 4-2 clocking, b-1 description, xv external interrupt exception, 5-15 hid0 register, 5-10 machine check exception, 5-9 memory-forced direct-store interface, 7-9 misaligned data transfers, 3-13 signals hp_snp_req , 2-17 sreset , 5-6 transfer code signal encoding, 2-12 transfer encoding, 2-9 upgrade to 60x, c-1 powerpc 603 bus operations 32-bit data bus mode, 6-4 no-drtry mode, 6-1 reduced-pinout mode, 6-6 cache organization, 4-3 checkstop state, 5-13 clocking, b-2 description, xv external interrupt exception, 5-16 lwarx/stwcx. implementation, 4-11 machine check exception, 5-12 misaligned data transfers, 3-14 misaligned data transfers, 32-bit mode, 3-16 signals sreset , 5-7 transfer code signal encoding, 2-12 transfer encoding, 2-9 upgrade to 604, c-1 upgrade to 60x, c-1 powerpc 603e cache enhancements, 4-3 powerpc 604 ar tr y assertion on data transfers/arbitration, 3-20 cache organization, 4-4 clocking, b-2 data streaming mode bus operations, 6-3 description, xv l2 cache considerations, d-1 machine check exception, 5-13 misaligned data transfers, 3-14 normal to doze transition (604e), 2-33 signals sreset , 5-7 transfer code signal encoding, 2-13 transfer encoding, 2-9 upgrade to 60x, c-3 powerpc 604e cache enhancements, 4-5 no-drtry mode bus operation, 6-1 q qack (quiescent acknowledge) signal, 2-32 qreq (quiescent request) signal, 2-32 quiesc_req (quiescent request) signal, 2-31 r resume signal, 2-31 rsr v (reservation) signal, 2-29 run signal, 2-32 s shd (shared) signal, 2-19 signals 60x signals, overview, 1-4 aa ck , 2-17 , 8-4 abb , 2-3 address bus signals, 2-2 address transfer attribute signals, 2-8 , 3-9 address transfer signals, 2-6 address transfer start signals, 2-4 address transfer termination signals, 2-17 a n , 2-6 ape , 2-8 ap n , 2-7 arbitration signals, 2-2 , 3-4 ar tr y , 2-18 bg , 2-2 br , 2-2 bus arbitration signals, 3-4 ci , 2-15 ckstp_in , 2-28 ckstp_out , 2-28 cse n , 2-17 , 4-11 data bus arbitration signals, 2-20 data bus lane assignments, 2-23 data transfer signals, 2-22 , 3-22 data transfer termination signals, 2-25 , 3-23 dbb , 2-21 , 3-21 dbdis , 2-24 dbg , 2-20 dbw o , 2-21 , 3-22 , 8-1 dh n /dl n , 2-22 dpe , 2-24 dp n , 2-23 dr tr y , 2-25 gbl , 2-16 halted, 2-32 hp_snp_req (601 only), 2-17 hreset , 2-29 , 5-2 index-4 powerpc microprocessor family: the bus interface for 32-bit microprocessors index int , 2-27 l2_int, 2-30 mcp , 2-28 power managment signals, 2-31 processor state signals, 2-29 qack, 2-32 qreq , 2-32 quick reference list, 1-5 quiesc_req, 2-31 resume, 2-31 rsr v , 2-29 run, 2-32 shd , 2-19 smi , 2-27 sreset , 2-29 , 5-2 summary, signal differences, 2-34 sys_q uiesc , 2-31 system status signals, 2-27 , 5-1 t a , 2-25 tben, 2-30 tbst , 2-10 tc n , 2-11 tea , 2-26 tlbisync , 2-30 ts , 2-4 tsiz n , 2-10 , 3-9 tt n , 2-8 , 3-9 wt , 2-16 xa ts , 2-5 smi (system management interrupt) signal, 2-27 snoop responses, descriptions, 4-14 split-bus transactions, 3-5 sreset (soft reset) signal, 2-29 , 5-2 sys_q uiesc (system quiesced) signal, 2-31 t t a (transfer acknowledge) signal, 2-25 tben (time base enable) signal, 2-30 tbst (transfer burst) signals, 2-10 tc n (transfer code) signals, 2-11 tea (transfer error acknowledge) signal, 2-26 termination data transfer termination, bus error, 3-26 normal single-beat read/write, 3-24 timing diagram examples, 3-28 tlbie bus operation, 4-14 tlbie processing, 4-14 tlbisync (tlb synchronization) signal, 2-30 transfer protocols, 1-2 transition state doze to nap transition, 2-33 nap to doze transition, 2-34 normal to doze transition, 2-33 ts (transfer start) signals, 2-4 tsiz n (transfer size) signals, 2-10 , 3-9 tt n (transfer type) signals, 2-8 , 3-9 w wim bit settings, 4-19 wt (write-through) signals, 2-16 x xa ts (extended address transfer start) signals, 2-5 |
Price & Availability of POWERPC601 |
|
|
All Rights Reserved © IC-ON-LINE 2003 - 2022 |
[Add Bookmark] [Contact Us] [Link exchange] [Privacy policy] |
Mirror Sites : [www.datasheet.hk]
[www.maxim4u.com] [www.ic-on-line.cn]
[www.ic-on-line.com] [www.ic-on-line.net]
[www.alldatasheet.com.cn]
[www.gdcy.com]
[www.gdcy.net] |