![]() |
|
If you can't view the Datasheet, Please click here to try to view without PDF Reader . |
|
Datasheet File OCR Text: |
30275a revision : april 2003 issue date: amd alchemy? solutions au1500? processor pci bus performance application note
? 2003 advanced micro devices, inc. all rights reserved. the contents of this document are provided in connection with advanced micro devices, inc. (?amd?) products. amd makes no representations or warranties with respect to the accuracy or completeness of the contents of this publication and reserves the right to make changes to specifications and product descripti ons at any time without notice. no license, whether express, implied, arising by estoppel or otherwise, to any intellectual property rights is granted by this publication. except as set forth in amd?s standard terms and conditions of sale, amd assumes no liability whatsoever, and disclaims any express or implied warranty, relating to its products including, but not limited to, the implied warranty of merchantability, fitness for a particular purpose, or infringement of any intellectual prop- erty right. amd?s products are not designed, intended, authorized or warranted for use as compo- nents in systems intended for surgical implant into the body, or in other applications intended to support or sustain life, or in any other application in which the failure of amd?s product could create a situation where pers onal injury, death, or severe property or environmental damage may occur. amd reserves the right to discontinue or make changes to its products at any time without notice. contacts www.amd.com pcs.support@amd.com trademarks amd, the amd arrow logo, alchemy, and co mbinations thereof, and au1500 are trademarks of advanced micro devices, inc. other product names used in this publication are for identificati on purposes only and may be trademarks of their respective com panies. application note 3 amd alchemy? solutions au1500? processor pci bus performance rev. 30275a april 2003 1.0 introduction this document describes the performance characterist ics of the pci 2.2 bus controller integrated into the au1500 ? processor. this document assumes the read er is familiar with the pci 2.2 specification as found in the pci local bus specification rev. 2.2 (see 6.0 ?references? ). 2.0 pci bus controller overview the au1500 ? processor features an integrated pci 2.2 bus controller for connecting to external peripherals. the pci bus controller is designed to support a 32-bit wide interface at 33mhz or 66mhz. the pci controller can initiate master cycles and can also serve as a target for device- initiated master cycles into the sdram memory of the au1500 processor. the pci bus controller supports a maximum of f our loads and arbitration for five pci devices including the au1 core. two separate au1500 application notes describe clocking schemes for the pci bus controller and software techniques for utilizing the pci bus (see 6.0 ?references? ). the general arrangement of the pci bus controller is depicted below in figure 1: ?au1500? processor?s pci bus controller? . figure 1: au1500 ? processor?s pci bus controller au1 core sbus pci ad[31:0] au1500 ? cbe[3:0]# frame# irdy# trdy# stop# perr# devsel# pa r stop# idsel req[3:0] gnt[3:0] int[d:a]# clk rst# sdram sdram cntrlr pci device(s) 4 application note rev. 30275a april 2003 amd alchemy? solutions au1500? processor pci bus performance pci supports a variety of bus cycle types. this document focuses on the following six types: ? au1 core initiated single-beat read from pci ? au1 core initiated single-beat write to pci ? pci device initiated single-beat read from au1500 sdram ? pci device initiated single-beat write to au1500 sdram ? pci device initiated burst read from au1500 sdram ? pci device initiated burst write to au1500 sdram the performance of the pci bus is measured as th roughput, the amount of data that can be transferred to and from the pci bus in a given time period. th ese six pci bus cycle types represent the significant majority of pci bus cycles that occur in a r unning system. thus, the throughput estimates of each cycle type can be combined to provide an overall estimate of pci bus throughput. 3.0 pci bus cycles performance in the examples below, the au1 core is assumed to be operating at 396mhz, the system bus (sbus) at 198mhz (5.1ns per sbus clock), the pci bus at 66mhz (15.2ns per pci clock), and the sdram interface at 99mhz (10.1ns per sdram clock). the operating frequencies of the au1 core, system bus, and sdram are controlled by the sys_cpupll and sys_powerctrl registers. note: further information on the sdram timings used in this document are available in the ?au1x00 sdram performance? application note. 3.1 au1 core initiated single-beat read from pci the au1 core initiates pci single-beat reads with a load access, typically from pci device registers or memory. software frequently uses single-beat re ad accesses while managing the operation of a pci device. au1 core accesses to pci are non-cacheable, wh ich in turn initiate single beat accesses. note: the au1 core can initiate cacheable load accesses to pci address space via the cacheable memory window controlled by pci_cmem. cacheable load accesses to pci address space will initiate a burst read from pci, not a single beat read. an au1 core initiated access traverses both the system bus and the pci bus. the timing diagram of a pci single-beat read is provided in ?figure 3-5: basic read operation?, page 47, of the pci local bus specification rev. 2.2 . the activity of the buses during such an access is depicted below in figure 2: ?au1 core initiated si ngle-beat read from pci? . application note 5 amd alchemy? solutions au1500? processor pci bus performance rev. 30275a april 2003 figure 2: au1 core initiated single-beat read from pci the minimum time necessary for a pci single-beat re ad is the sum of the active sbus and pci bus times: 5 sbus clocks + 11 pci clocks + 3 sbus clocks (5 sbus clocks for synchronization and arbitration, 6 pci clocks for internal state machin e synchronization, 1 pci clock for arbitration, 1 pci clock for address/command, 1 pci clock for turn around, 1 pci clock for data, 1 pci clock for state machine and 3 sbus clocks to return data to the au1 core and to complete the access). this yields 5*5.1ns + 11*15.2ns + 3*5.1ns or 208.0ns for an au1 core initiated single-beat read access. at 208.0ns per single-beat read, the theoretical ma ximum number of single-beat reads possible is 4,807,692 per second, which yields a theoretical maximum throughput of 19.2mb/s (4,807,692 * 4 bytes). in reality, the timing of a single-beat read usually exceeds the minimum time outlined above for the reasons provided in 4.2 ?detrimental influences on pci bus performance? , in particular the assertion of devsel# and trdy#. furthermore, a pci read must stall the au1 core un til data is returned. as ill ustrated above, the time can be considerable depending upon the ability of the pci device to return data in a timely fashion. 3.2 au1 core initiated single-beat write to pci the au1 core initiates pci single-beat writes with a store access, typically to pci device registers or memory. software frequently uses single-beat wr ite accesses while managing the operation of a pci device. for graphics devices, write accesses domin ate all other accesses due to the drawing of the frame buffer contents. au1 core accesses to pci are non-cacheable, wh ich in turn initiate single beat accesses. note: the au1 core can initiate cacheable store accesses to pci address space via the cacheable memory window controlled by pci_cmem. cacheable store accesses to pci address space do not initiate a burst write immediately; a burst write to pci is initiated after the data cache casts out the corresponding cache line(s), or the corresponding cache line(s) are flushed. note: by programming the tlb with cca=7 when mapping pci memory spaces, the write buffer can gather au1 core stores which in turn leads to more efficient burst writes into pci memory space. an au1 core initiated access traverses both the system bus and the pci bus. the timing diagram of a pci single-beat write is provided in ?figure 3-6: basic write operation?, page 48, of the pci local bus specification rev. 2.2 . the activity of the buses during such an access is depicted below in figure 3: ?au1 core initiated single-beat write to pci? . sbus pci sdram 6 application note rev. 30275a april 2003 amd alchemy? solutions au1500? processor pci bus performance figure 3: au1 core initiated single-beat write to pci the minimum time necessary for a pci single-beat write is the sum of the active sbus and pci bus times: 5 sbus clocks + 9 pci clocks (5 sbus cloc ks for synchronization, arbitration and start of pci write access, 6 pci clocks for state machine sync hronization, 1 pci clock for arbitration, 1 pci clock for address/command, and 1 pci cl ock for data). this yields 5*5.1ns + 9*15.2ns or 162.3ns for an au1 core initiated single-beat write access. at 162.3ns per single-beat write, the theoretical ma ximum number of single-beat writes possible is 6,161,429 per second, which yields a theoretical maximum throughput of 24.6mb/s (6,161,429 * 4 bytes). in reality, the timing of a single-beat write usually exceeds the minimum time outlined above for the reasons provided in 4.2 ?detrimental influences on pci bus performance? , in particular the assertion of devsel# and trdy#. however, the pci write fifo can reduce the single-beat timing, see 4.1.3 ?pci write fifo? . 3.3 pci device initiated single-beat read from au1500 ? processor sdram a pci device initiates single-beat reads of a u1500 processor sdram during its operation, typically while processing a dma ring buffer or similar data structure. a pci device access to au1500 sdram traverses the pci bus, the system bus, and the sdram interface. the timing diagram of a pci single-beat read is provided in ?figure 3-5: basic read operation?, page 47, of the pci local bus specification rev. 2.2 . the activity of the buses during the access is depicted below in figure 4: ?pci device initiated single-beat read from au1500? processor sdram? . figure 4: pci device initiated single-beat read from au1500 ? processor sdram the minimum time necessary for a pci device single- beat read is the sum of the active pci bus, sbus and sdram times: 3 pci clocks + 5 sbus clocks + 6 sdram clocks + 1 pci clock (1 pci sbus pci sdram sbus pci sdram application note 7 amd alchemy? solutions au1500? processor pci bus performance rev. 30275a april 2003 clock for arbitration, 1 pci clock for address/ command, 1 pci clock for turnaround, 5 sbus clocks for synchronization and arbitration, 6 sdram clocks for a single-beat read, and 1 pci clock for data to complete the access). this yields 4*15.2ns + 5*5.1ns + 6*10.1ns or 146.9ns for pci device initiated single-beat read access to au1500 sdram. at 146.9ns per single-beat read, the theoretical ma ximum number of single-beat reads possible is 6,807,351 per second, which yields a theoretical maximum throughput of 27.2mb/s (6,807,351 * 4 bytes). in reality, the timing of the access usually exceeds the minimum time outlined above for the reasons provided in 4.2 ?detrimental influences on pci bus performance? , in particular the arbitration for the system bus and pci retries. 3.4 pci device initiated single-beat write to au1500 ? processor sdram a pci device initiates single-beat writes of a u1500 processor sdram during its operation, typically while processing/updating a dma ring buffer or similar data structure. a pci device access to au1500 sdram traverses the pci bus, the system bus, and the sdram interface. the timing diagram of a pci single-beat write is provided in ?figure 3-6: basic write operation?, page 48, of the pci local bus specification rev. 2.2 . the activity of the buses during the access is depicted below in figure 5: ?pci device initiated single-beat write to au1500? processor sdram? . figure 5: pci device initiated single-beat write to au1500 ? processor sdram the minimum time necessary for a pci device single- beat write is the sum of the active pci bus, sbus and sdram times: 6 pci clocks + 5 sbus cloc ks + 6 sdram clocks (3 pci clocks for state machine synchronization, 1 pci clock for arbitrati on, 1 pci clock for address/command, 1 pci clock for data, 5 sbus clocks for synchronization and ar bitration, and 6 sdram clocks for a single-beat write). this yields 6*15.2ns + 5*5.1ns + 6*10.1ns or 177.3ns for pci device initiated single-beat write access to au1500 sdram. at 177.3ns per single-beat write, the theoretical ma ximum number of single-beat writes possible is 5,640,157 per second, which yields a theoretical maximum throughput of 22.6mb/s (5,640,157 * 4 bytes). in reality, the timing of the access usually exceeds the minimum time outlined above for the reasons provided in 4.2 ?detrimental influences on pci bus performance? , in particular the arbitration for the system bus and pci retries. once the pci data has been moved onto the system bus (on its way to the sdram), the next pci bus cycle can be initiated, but if the access is to th e au1500 processor sdram, the cycle stalls until the pci sdram sbus 8 application note rev. 30275a april 2003 amd alchemy? solutions au1500? processor pci bus performance previous write completes. for example, pci-to-pci bus cycles can continue while data is transferred to au1500 sdram. 3.5 pci device initiated burst read from au1500 ? processor sdram a pci device initiates burst reads of au1500 proce ssor sdram during its operation, typically while utilizing pci bus-mastering dma to transmit netw ork packets or writing disk blocks. the burst transfers permit efficient movement of data and thus better performing i/o. a pci device access to au1500 sdram traverses the pci bus, the system bus, and the sdram interface. the timing diagram of a pci burst read is provided in ?figure 3-5: basic read operation?, page 47, of the pci local bus specification rev. 2.2 . the activity of the buses during the access is depicted below in figure 6: ?pci device initiated burs t read from au1500? processor sdram? . figure 6: pci device initiated burst read from au1500 ? processor sdram the minimum time necessary for a pci device burst read of eight words from au1500 sdram is the sum of the active pci bus, sbus and sdram times: 6 pci clocks + 5 sbus clocks + 12 sdram clocks + 8 pci clocks (3 pci cl ocks for state machine synchronization, 1 pci clock for arbitration, 1 pci clock for address/command, 1 pci clock for turnaround, 5 sbus clocks for synchronization and arbitration, and 12 sdram clocks for a burst read, a nd 8 pci clocks for eight words of data). this yields 14*15.2ns + 5*5.1ns + 12*10.1ns or 359.5.1ns for pci device initiated burst read access to au1500 processor sdram. at 359.5ns per eight word burst, the theoretical maximum number of burst reads possible is 2,781,641 per second, which yields a theoretical maximum throughput of 89.0mb/s (2,781,641 * 32 bytes). in reality, the timing of the access usually exceeds the minimum time outlined above for the reasons provided in 4.2 ?detrimental influences on pci bus performance? , in particular the arbitration for the system bus and pci retries. 3.6 pci device initiated burst write to au1500 ? processor sdram a pci device initiates burst wr ites of au1500 processor sdram duri ng its operation, typically while utilizing pci bus-mastering dma to receive networ k packets or reading disk blocks. the burst transfers permit efficient movement of data and thus better performing i/o. a pci device access to au1500 sdram traverses the pci bus, the system bus, and the sdram interface. the timing diagram of a pci burst write is provided in ?figure 3-6: basic write operation?, sbus pci sdram application note 9 amd alchemy? solutions au1500? processor pci bus performance rev. 30275a april 2003 page 48, of the pci local bus specification rev. 2.2 . the activity of the buses during the access is depicted below in figure 7: ?pci device initiated burs t write to au1500? processor sdram? . figure 7: pci device initiated burst write to au1500 ? processor sdram the minimum time necessary for a pci device burst write of eight words to au1500 sdram is the sum of the active pci bus, sbus and sdram times: 13 pci clocks + 5 sbus clocks + 12 sdram clocks (3 pci clocks for state machine synchroni zation, 1 pci clock for arbitration, 1 pci clock for address/command, 5 sbus clocks for synchroniza tion and arbitration, and 12 sdram clocks for a burst write, and 8 pci clocks for eight word s of data). this yields 13*15.2ns + 5*5.1ns + 12*10.1ns or 344.3ns for pci device initiated burst write access to au1500 sdram. at 344.3ns per eight word burst, the theoretical maximum number of burst writes possible is 2,904,443 per second, which yields a theoretical maximum throughput of 92.9mb/s (2,904,443 * 32 bytes). in reality, the timing of the access usually exceeds the minimum time outlined above for the reasons provided in 4.2 ?detrimental influences on pci bus performance? , in particular the arbitration for the system bus and pci retries. once the pci data has been moved onto the system bus (on its way to the sdram), the next pci bus cycle can be initiated, but if the access is to th e au1500 processor sdram, the cycle stalls until the previous write completes. for example, pci-to-pci bus cycles can continue while data is transferred to au1500 sdram. 4.0 pci bus performance the overall throughput of the pci bus interface dominated by the six pci access cycles described previously. the maximum throughput of the pci bus controller is approx imated by this equation: tp =(ausbrtp * ausbrr) + (ausbwtp * ausbwr) + (pdsbrtp * pdsbrr) + (pdsbwtp * pdsbwr) + (pdbrtp * pdbrr) + (pdbwtp * pdbwr) where ? ausbrtp is the au1 core initiated single-beat read maximum throughput ? ausbwtp is the au1 core initiated single-beat write maximum throughput ? pdsbrtp is the pci device initiated single-beat read from au1500 processor sdram maximum throughput sbus pci sdram 10 application note rev. 30275a april 2003 amd alchemy? solutions au1500? processor pci bus performance ? pdsbwtp is the pci device initiated single- beat write to au1500 processor sdram maximum throughput ? pdbrtp is the pci device initiated burst read from au1500 processor sdram maximum throughput ? pdbwtp is the pci device initiated burst write to au1500 proc essor sdram maximum throughput all of these variables are the maximum throughput va lues identified in the discussion of each pci access cycle type. the remaining variables are: ? ausbrr is the ratio of au1 core initiated single-beat reads ? ausbwr is the ratio of au1 core initiated single-beat writes ? pdsbrr is the ratio of pci device initiate d single-beat reads from au1500 processor sdram ? pdsbwr is the ratio of pci device initiated single-beat writes to au1500 processor sdram ? pdbrr is the ratio of pci device initiated burst reads from au1500 processor sdram ? pdbwr is the ratio of pci device initia ted burst writes to au1500 processor sdram the sum of the ratios must equal 1.0 to represen t 100% pci bus utilization. the actual ratios for a given system depend upon the types of devices connected to the pci bus. this equation is only an approximation and te nds to yield a realistic upper-bound of pci bus throughput. the dynamic nature of i/o and the types of devices connected to the pci bus often result in less than optimal throughput on the pci bus. a few examples are provided at the end of this discussion. the variety of the pci devices and the interaction with the overall system influences the pci bus throughput. 4.1 positive influences on pci bus performance the following items act to improve the pci bus throughput. ? utilizing the fast back-to-back capabilities of the pci device and the au1500 processor?s pci controller, pci arbitration cycles are re duced thus shortening the pci access time. ? the au1500 processor?s pci controller features a coherency setting (pci_config[nc]=0) whereby pci requests of au1500 sdram are snooped by the data cache. if the request hits in the data cache, the data cache fulfills the request immediately, thus avoiding the need to access external sdram. ? the au1500 processor?s pci controller features a cacheable window into the pci memory address space (the pci_cmem register). accesses to this window initiate burst transfers rather than single-beat transfers. application note 11 amd alchemy? solutions au1500? processor pci bus performance rev. 30275a april 2003 ? by programming the tlb with cca=7 when mapping pci memory spaces, the write buffer can gather au1 core stores which in turn leads to more efficient burst writes into pci memory space. ? to improve performance, the a u1500 processor implements a write fifo between the system bus and pci. this fifo effectively shortens the au1 co re write cycle to just the system bus time, if the fifo has an available slot. the pci_cmem feature, the use of cca=7, and the write fifo items warrant additional discussion as these performance features can make a signifi cant, positive improvement in pci bus throughput. 4.1.1 pci cacheable memory window the au1500 ? processor?s pci controller features a cacheable window into pci memory space via the pci_cmem register. the pci cacheable window is used to map a pre-fetchable pci memory space, enabling the au1 core to cache the pci memory wi ndow contents. this has two mutually beneficial effects: 1) the au1 core caches the space fo r improved processing performance, and 2) the data cache initiates burst transfer s to and from pci for better pci bus throughput. the tlb mapping that covers pci_cmem must use cca=4. this cca encoding fetches word 0 first (as opposed to critical word first), to match the pci specification. also note that cca=4 is a non- coherent configuration, therefore the data cache does not snoop pci memory space accesses. for example, consider a pci memory space that is bo th mapped via pci_cmem and either the source and/ or destination of a a pci target-to-target transfer, in this scenario the target-to-target transfer is contained solely within the pci bus, so the au1 core cache can not snoop the transfer, and as a result, either the pci target or the au1 da ta cache might contain stale data. on an au1 core read from this window, the data cache initiates a burst read transfer. the timing of the burst read access is depicted here in figure 8: ?au1 core initiated burst read from pci? . figure 8: au1 core initiated burst read from pci the minimum time necessary for a pci burst read is the sum of the active sbus and pci bus times: 5 sbus clocks + 18 pci clocks + 10 sbus clocks (5 sbus clocks for synchronization and arbitration, 6 pci clocks for internal state machine synchroni zation, 1 pci clock for arbitration, 1 pci clock for address/command, 1 pci clock for turnaround, 8 pci clocks for data, 1 pci clock for state machine and 10 sbus clocks to return data to the au1 co re and to complete the access). this yields 350.1ns for an au1 core initiated bust read access. this yields 91.4mb/s throughput, vastly improved compared to the single-beat read 19.2mb/s throughput. sbus pci sdram 12 application note rev. 30275a april 2003 amd alchemy? solutions au1500? processor pci bus performance on a cast-out of a dirty cache line, the data cache initiates a burst write transfer. the timing of the burst write access is depicted here in figure 9: ?au1 core initiated burst write to pci? . figure 9: au1 core initiated burst write to pci the minimum time necessary for a pci burst write is the sum of the active sbus and pci bus times: 12 sbus clocks + 17 pci clocks (5 sbus clocks for synchronization, arbitration and burst write access, 6 pci clocks for state machine synchroniza tion, 1 pci clock for arbitration, 1 pci clock for address/command, and 8 pci clocks for data). this yields 258.4ns for an au1 core initiated burst write access. this yields 123.8mb/s throughput, greatl y improved compared to the single-beat write 24.6mb/s throughput. the pci write buffer can improve performance as well, see discussion in 4.1.3 ?pci write fifo? . if the software environment/application permits, th e cacheable window allows the au1 core to cache pci memory for improved pci bus throughput and overall system performance. 4.1.2 pci cca=7 the tlb mappings for pci must use a non-cacheabl e setting (the exception being the pci_cmem window previously discussed). typically the tlb cca value is 2 (non-cached, non-mergeable, non- gatherable), but by programming the tlb with cca va lue of 7, the write buffer can merge and gather au1 core stores into more efficient burst writes into pci memory space. the throughput advantage of burst writes to pci is discussed previously, the actual effect on overall system performance is positive but difficult to de termine due to the dynamic nature of the run-time system. the pci write buffer can further improve performance, see discussion in 4.1.3 ?pci write fifo? . 4.1.3 pci write fifo to improve performance, the au1500 processor imp lements a write fifo between the system bus and the pci bus. this fifo effectively shortens the a u1 core write cycles to just the system bus time, if the fifo has an available slot to accept the write. that is, from the au1 core perspective, the write completes as soon as the fifo accepts it, rather than waiting until the write to the pci target completes. the fifo can accept any combination of two single- beat write accesses or two burst write accesses. a third write access stalls the au1 core (and the system bus) until a slot is available. the pci write fifo improves overall system perf ormance by buffering write accesses to pci, thus sbus pci sdram application note 13 amd alchemy? solutions au1500? processor pci bus performance rev. 30275a april 2003 enabling the au1 core and system bus to continue with other activities while the writes to pci complete. 4.2 detrimental influences on pci bus performance the following items may reduce the pci bus throughput. ? the pci bus runs asynchronously to the au1 core and system bus. as a result, on each access several clock cycles are consumed s ynchronizing the different clock domains. ? if one 33mhz pci device is connected to the pci bus, then the entire pci bus operates at 33mhz, even for devices that can operate at 66mhz. all the examples above were calculated for a 66mhz bus, reducing the bus to 33mhz will have a detrimental impact on pci bus throughput. ? the pci clock may not be exac tly 66mhz. when using an internally generated clock, a 64mhz (or 32mhz) pci clock is common. ? the system bus is shared by a number of master s (au1 core, pci, usb, ethernet, dma). as such, pci bus cycles that use the system bus to access sdram may experience increased latency until the pci bus wins arbitration of the system bus. fu rthermore, accesses to th e static bus controller (e.g. flash, pcmcia, etc.) by the au1 core or dm a occupy the system bus and can add tens or hundreds of nanoseconds of latency to arbitration. ? au1 core initiated reads of pci space prevent a pci device that is attempting to access au1500 sdram from winning arbitration on the system bus (because the au1 core won arbitration for the system bus). ? the devsel# timing for a given pci device can increase the access time, which in turn decreases the pci bus throughput. pci devices asse rt devsel# fast, medium or slow (see pci configuration space status register). ? many pci devices are unable to satisfy a read or write request immediately. the trdy# signal is de-asserted by the device to insert wait states into the access. ? for pci bus cycles that access au1500 processor sdram, if the pci controller is unable to win system bus arbitration, then a pci retry is signa lled. in this situation, the access time to au1500 sdram can be extended up to 16 pci clocks while the pci controller attempts to win system bus arbitration. ? the discussions above assume 4-byt es of data for a single-beat access, and 32-bytes of data for a burst access. in reality, not all accesses will transf er 4 or 32-bytes in the access; it can be less and thus will further decrease pci bus throughput. ? the discussions above ignored other pci cycles types (e.g. c/be encoding). initiating the other pci cycles types further decreases the pci bus ba ndwidth available to the six data movement pci cycles types. of the above factors, the pci bus clock, the pci target trdy# timing, and the number of active system bus masters are the most detrimental influences on pci bus throughput. 14 application note rev. 30275a april 2003 amd alchemy? solutions au1500? processor pci bus performance 4.3 pci bus performance examples here are two typical examples to illustrate pci bus throughput. in a real design, the system should be profiled in order to determine more accurate ra tios and thus a better estimate of the pci bus throughput. 4.3.1 network device example a high performance pci-based networking device us es pci bus-mastering dma to transfer packets to/from au1500 sdram. the device also uses ri ng buffers in au1500 processor sdram to provide queues of incoming and outgoing packets. in this envi ronment, the ratios are similar to the following: ? ausbrr is 0.10 for managing device operation, servicing interrupts, etc. ? ausbwr is also 0.10 for managing device operation, servicing interrupts, etc. ? pdsbrr is 0.10 for reading ring buffer contents ? pdsbwr is 0.10 for updating ring buffer status ? pdbrr is 0.30 for transmitting outgoing packets ? pdbwr is 0.30 for receiving incoming packets in substituting all the values, the throughput becomes: tp =( 19.2mb/s * 0.10) + ( 24.6mb/s * 0.10) + ( 27.2mb/s * 0.10) + ( 22.6mb/s * 0.10) + ( 89.0mb/s * 0.30) + ( 92.9mb/s * 0.30) tp = 63.9mb/s this example is also indicative of a high performance disk controller. 4.3.2 graphics device example a high performance pci-based graphics device is programmed with drawi ng commands to perform the drawing locally (i.e. hardware acceleration), and drawing is also performed by the au1 core directly writing into the frame buffer memory. in this environment, the ratios are similar to the following: ? ausbrr is 0.20 for moderate read-modi fy-write pixel operations (blits) ? ausbwr is also 0.80 for drawing commands, frame buffer updates (blits) ? pdsbrr is 0.00 ? pdsbwr is 0.00 ? pdbrr is 0.00 application note 15 amd alchemy? solutions au1500? processor pci bus performance rev. 30275a april 2003 ? pdbwr is 0.00 in substituting all the values, the throughput becomes: tp =( 19.2mb/s * 0.20) + ( 24.6mb/s * 0.80) + ( 27.2mb/s * 0.00) + ( 22.6mb/s * 0.00) + ( 89.0mb/s * 0.00) + ( 92.9mb/s * 0.00) tp = 23.5mb/s the pci bus throughput capable with the au1500 processor enables good motion video decode. for instance, full motion video decode typically require s 30 frames per second which is achievable with the au1500 processor: ? a video clip with resolution of 800x600 at 16bpp requires 960,000 bytes per frame. this yields a frame rate of 24 frames per second (23.5mb/s / 960,000 bytes/frame). ? a video clip with resolution of 640x480 at 16bpp requires 614,400 bytes per frame. this yields a frame rate of 38 frames per sec ond (23.5mb/s / 614,400 bytes per frame). this throughput translates into very good graphics experiences. 5.0 conclusion the pci controller integrated into the au1500 ? processor is capable of handling common pci based peripherals such as networking, graphics and disk controllers. the actual pci bus performance is dependent upon the devices connected to the pci bus and can be estimated using the equation provided. 6.0 references 1. the alchemy ? au1500 ? internet edge processor data book , alchemy semiconductor, 2001. 2. pci local bus specification rev. 2.2 , pci special interest group, 1998. 3. pci clock generation on the alchemy ? au1500 ? processor from amd - application note , amd, 2002. 4. pci bus software support for alchemy ? au1500 ? processor from amd - application note , amd, 2002. 5. amd alchemy? solutions au1000?, au1100? and au1500? processors sdram performance - application note , amd, 2003. |
Price & Availability of AU1500PCIPERF30275A
![]() |
|
|
All Rights Reserved © IC-ON-LINE 2003 - 2022 |
[Add Bookmark] [Contact Us] [Link exchange] [Privacy policy] |
Mirror Sites : [www.datasheet.hk]
[www.maxim4u.com] [www.ic-on-line.cn]
[www.ic-on-line.com] [www.ic-on-line.net]
[www.alldatasheet.com.cn]
[www.gdcy.com]
[www.gdcy.net] |