ATI's new Radeon X1900-series products are fast enough for current games and promise high performance for future titles too. But there will definitely be disputes between supporters of Nvidia hardware that offers a more conservative architecture and backers of ATI's more forward-looking, but sometimes less speedier for today approach.
Beyond3D web-site has published an interview with ATI's Eric Demers and Richard Huddy, who explain the reasons why ATI believes that incorporating three times more pixel shader processors (which consist of ALUs, arithmetic logic units) compared to texture units (sometimes referred as TEX) is a right balance for future and current needs of games.
?It's also a chicken and egg thing, in that ISVs [independent software vendors] will tell us what they are doing, but they will also be influenced to designing games with our new technology in mind. If we come out with 3:1 ALU:TEX ratio HW [hardware], then designers will tend to add more ALUs for next games, and so it's a mutually influenced evolution,?says Eric Demers.
When asked about the ratio between texture units and raster (ROP) units, it was said that although right now there is a need to maintain 1:1 proportion, there will be no need for that ratio in future.
?Right now, they seem to balance at 1:1 (TEX:ROP), but the trend is towards lowering ROPs, in general. The reality is that shading per pixel is increasing, which usually means many ALUs and many textures per pixel, as well as many cycles per pixel. Since we need only 1 ROP per cycle per pixel, effectively, the ROP throughput requirement is going down on new apps. An RV530 is a prime example ?It doesn't have more ROP than the R515, but having triple the shading and double the Z, it's around 2x the speed of the R515 in a lot of cases,?said Mr. Demers.
Besides, the interview reveals some uncovered things about the Radeon X1000 architecture in general.
?The new dispatcher was required to allow for a linearly scalable ALU architecture (say that 5 times fast!). The R3xx/R4xx sequencer was never designed with this in mind, so it had to be redesigned for that, at least. But it's more than that. With triple the ALU demand to texture resource, we need to be even more efficient on hiding the texture fetch latency (as well as flow control), and the high thread counts of the X1K architecture easily allow for that. We've found that the R580 efficiency is on par with the R520's, which indicates that our design and dispatcher are capable of pretty amazing efficiencies (and wasn't even taxed that hard on R520),?Eric Demers claims.
- Beyond3D: R580 Architecture Interview.