PDA

View Full Version : Ram timings: ATi vs nV. Inquiry into why ATi's ram clocks poorly...


Sentential
07-30-05, 11:11 PM
Felinuz and I have been discussing this for quite some time and we have yet to figure out why ever since the 9800 days ATi's ram has done so peformed so poorly. Often timings not even peforming stock speeds.

Its been an unofficial belief among all of us here that ATi uses tighter timings than what nVidia does. Tonight we are going to try and put this to rest.

Felinuz is manually checking his timings against 6600GTs and 6800GTs (nV cards with GC20) in hopes to find out the anwser to this riddle.

For the time being, would anyone like to comment?

d94
07-30-05, 11:12 PM
ill be here waitin for the results.. :)

RedDragonXXX
07-30-05, 11:15 PM
Be very carefull not to damage you vid card!

Son1990
07-30-05, 11:19 PM
Ditto, waiting for results

felinusz
07-31-05, 12:30 AM
To start, we have a problem. ATiTool, Rabit, and Nibitor (the three BIOS editing programs which I could find, that allow manipulation of the card's GDDR memory timings - ATiTool/Rabit being ATi-based programs, Nibitor being nVidia-based) display the timings differently, with little clue as to which value corelates to which.

I've uploaded pics of each program and their display of the card's GDDR timings - the ATi/nVidia sets use different "values" (names) for each timing.

So, we've got our timings, but can't make a direct comparison without being able to "translate" the values.

You can get Nibitor here: http://www.mvktech.net/index.php?option=com_remository&Itemid=26&func=fileinfo&parent=folder&filecatid=1137
You can get Rabit here: http://www.mvktech.net/index.php?option=com_remository&Itemid=26&func=fileinfo&parent=folder&filecatid=801

Sentential
07-31-05, 12:53 AM
Samsung Default Timings

/************************************************** ******************************/
/* Define Statement */
/************************************************** ******************************/

/************************************************** ******************************/
/* -12 Specification */
/************************************************** ******************************/
`ifdef S12 // -11 spec
`define tGR 2 // Gapless 2 tCK
`define tRTW 14 // Read to Write at same bank - CL=9tCK, tCDLR=5tCK

`define tRC 35 // Row cycle time(min) - operation (tCK)
`define tRFC 45 // Row cycle time(min) - Auto Refresh (tCK)
`define tRASmin 25 // Row active minimum time (tCK)
`define tRASmax 100000 // Row active maximum time (tCK)
`define tRCDRD 12 // Ras to cas delay(min) for Read (tCK)
`define tRCDWR 7 // Ras to cas delay(min) for Write (tCK)
`define tRP 10 // Row precharge time(min) (tCK)
`define tRRD 9 // Row to row delay(min) (tCK)
`define tWR 7 // Last data in Row precharge (7 tCK)
`define tCDLR 5 // Last data in to Read delay (6 tCK)
`define tCDLW 0 // Last data in to Write delay (0 tCK)
`define tCCD 2 // Col. address to col. address delay (3 tCK)
`define tCKmin 1.2 // Clock minimum cycle time (ns) - CL=9
`define tCKmax 6 // Clock maximun cycle time (ns) - CL=9

`define tCC5 2.5 // Clock minimun cycle time at cas latency=5 (ns)
`define tCC6 2.22 // Clock minimun cycle time at cas latency=6 (ns)
`define tCC7 1.8 // Clock minimun cycle time at cas latency=7 (ns)
`define tCC8 1.6 // Clock minimun cycle time at cas latency=8 (ns)
`define tCC9 1.4 // Clock mimimum cycle time at cas latency=9 (ns)
`define tCHmin 0.45 // Clock high pulse width (min:0.45tCK, max:0.55tCK)
`define tCHmax 0.55 // Clock high pulse width (min:0.45tCK, max:0.55tCK)
`define tCLmin 0.45 // Clock low pulse width (min:0.45tCK, max:0.55tCK)
`define tCLmax 0.55 // Clock low pulse width (min:0.45tCK, max:0.55tCK)
`define tDQSCK 0.23 // DQS out edge to clock edge (min:-0.26, max:0.26)
`define tDQSQ 0.14 // Data strobe edge to output data edge (min:-0.16, max:+0.16)
//`define tSLZ 0.75 // DQS low-Z to vaild DQS delay @ Read Preamble (min:tCK-0.75, max:tCK+0.75)
//`define tSHZ 0.75 // Valid DQS to DQS Hi-Z delay @ Postamble (min:tCK/2+0.75, max:tCK/2+0.75)
//`define tHZQ 0.75 // Data out active to High-Z (min:tCK/2-0.75, max:tCK/2+0.75)
//`define tDQCK 0.16 // out data edge to clock edge (min:-0.16, max:0.16)
`define tDQSS1min 0.85 // Write command to first DQS latching transition - WL=1
`define tDQSS1max 1.15 // Write command to first DQS latching transition - WL=1
`define tDQSS2min 1.85 // Write command to first DQS latching transition - WL=2
`define tDQSS2max 2.15 // Write command to first DQS latching transition - WL=2
`define tDQSS3min 2.85 // Write command to first DQS latching transition - WL=3
`define tDQSS3max 3.15 // Write command to first DQS latching transition - WL=3
`define tDQSS4min 3.85 // Write command to first DQS latching transition - WL=4
`define tDQSS4max 4.15 // Write command to first DQS latching transition - WL=4
`define tDQSS5min 4.85 // Write command to first DQS latching transition - WL=5
`define tDQSS5max 5.15 // Write command to first DQS latching transition - WL=5

//`define tSDQS 0 // DQS-in setup time (ns)
//`define tWPREH 0.25 // DQS-in hold time (0.25 tCK)
//`define tSIHmin 0.45 // DQS-in high level width (0.45tCK)
//`define tSIHmax 0.55 // DQS-in high level width (0.55tCK)
//`define tSILmin 0.45 // DQS-in high level width (0.45tCK)
//`define tSILmax 0.55 // DQS-in high level width (0.55tCK)
//`define tSICmin 0.9 // DQS-in cycle time (0.9 tCK)
//`define tSICmax 1.1 // DQS-in cycle time (1.1 tCK)
`define tDSC 1 // DQS-in cycle time (1 tCK)

`define tIS 0.3 // Input setup time (ns)
`define tIH 0.3 // Input hold time (ns)
`define tMRD 7 // Mode register set cycle time (6tCK)
`define tDS 0.16 // Data in & DM set-up time (ns)
`define tDH 0.16 // Data in & DM hold time (ns)

//`define tDIPW 1.75 // Data in & DM input pulse width (ns)
//`define tDV 0.375 // Output DQS vaild window (tCK)

//`define tPDEX 7.5 // Power Down exit Time (ns)
//`define tXSA 20000 // Exit self refresh to bank active command (tCK)
`define tREF 7.8 // Refresh interval time (us)
`define tWPST 0.4 // DQS write postamble time (tCK)
`define tDAL 24 // Auto precharge write recovery + precharge time (ns)
`endif


/************************************************** ******************************/
/* -14 Specification */
/************************************************** ******************************/
`ifdef S14 // -14 spec
`define tGR 2 // Gapless 2 tCK
`define tRTW 14 // Read to Write at same bank - CL=9tCK, tCDLR=5tCK

`define tRC 31 // Row cycle time(min) - operation (tCK)
`define tRFC 39 // Row cycle time(min) - Auto Refresh (tCK)
`define tRASmin 22 // Row active minimum time (tCK)
`define tRASmax 100000 // Row active maximum time (tCK)
`define tRCDRD 10 // Ras to cas delay(min) for Read (tCK)
`define tRCDWR 6 // Ras to cas delay(min) for Write (tCK)
`define tRP 9 // Row precharge time(min) (tCK)
`define tRRD 8 // Row to row delay(min) (tCK)
`define tWR 6 // Last data in Row precharge (7 tCK)
`define tCDLR 5 // Last data in to Read delay (6 tCK)
`define tCDLW 0 // Last data in to Write delay (0 tCK)
`define tCCD 2 // Col. address to col. address delay (3 tCK)
`define tCKmin 1.4 // Clock minimum cycle time (ns) - CL=9
`define tCKmax 6 // Clock maximun cycle time (ns) - CL=9

`define tCC5 2.5 // Clock minimun cycle time at cas latency=5 (ns)
`define tCC6 2.22 // Clock minimun cycle time at cas latency=6 (ns)
`define tCC7 1.8 // Clock minimun cycle time at cas latency=7 (ns)
`define tCC8 1.6 // Clock minimun cycle time at cas latency=8 (ns)
`define tCC9 1.4 // Clock mimimum cycle time at cas latency=9 (ns)

`define tCHmin 0.45 // Clock high pulse width (min:0.45tCK, max:0.55tCK)
`define tCHmax 0.55 // Clock high pulse width (min:0.45tCK, max:0.55tCK)
`define tCLmin 0.45 // Clock low pulse width (min:0.45tCK, max:0.55tCK)
`define tCLmax 0.55 // Clock low pulse width (min:0.45tCK, max:0.55tCK)
`define tDQSCK 0.26 // DQS out edge to clock edge (min:-0.26, max:0.26)
`define tDQSQ 0.16 // Data strobe edge to output data edge (min:-0.16, max:+0.16)

//`define tSLZ 0.75 // DQS low-Z to vaild DQS delay @ Read Preamble (min:tCK-0.75, max:tCK+0.75)
//`define tSHZ 0.75 // Valid DQS to DQS Hi-Z delay @ Postamble (min:tCK/2+0.75, max:tCK/2+0.75)
//`define tHZQ 0.75 // Data out active to High-Z (min:tCK/2-0.75, max:tCK/2+0.75)
//`define tDQCK 0.16 // out data edge to clock edge (min:-0.16, max:0.16)
`define tDQSS1min 0.85 // Write command to first DQS latching transition - WL=1
`define tDQSS1max 1.15 // Write command to first DQS latching transition - WL=1
`define tDQSS2min 1.85 // Write command to first DQS latching transition - WL=2
`define tDQSS2max 2.15 // Write command to first DQS latching transition - WL=2
`define tDQSS3min 2.85 // Write command to first DQS latching transition - WL=3
`define tDQSS3max 3.15 // Write command to first DQS latching transition - WL=3
`define tDQSS4min 3.85 // Write command to first DQS latching transition - WL=4
`define tDQSS4max 4.15 // Write command to first DQS latching transition - WL=4
`define tDQSS5min 4.85 // Write command to first DQS latching transition - WL=5
`define tDQSS5max 5.15 // Write command to first DQS latching transition - WL=5

//`define tSDQS 0 // DQS-in setup time (ns)
//`define tWPREH 0.25 // DQS-in hold time (0.25 tCK)
//`define tSIHmin 0.45 // DQS-in high level width (0.45tCK)
//`define tSIHmax 0.55 // DQS-in high level width (0.55tCK)
//`define tSILmin 0.45 // DQS-in high level width (0.45tCK)
//`define tSILmax 0.55 // DQS-in high level width (0.55tCK)
//`define tSICmin 0.9 // DQS-in cycle time (0.9 tCK)
//`define tSICmax 1.1 // DQS-in cycle time (1.1 tCK)
`define tDSC 1 // DQS-in cycle time (1 tCK)

`define tIS 0.35 // Input setup time (ns)
`define tIH 0.35 // Input hold time (ns)
`define tMRD 6 // Mode register set cycle time (6tCK)
`define tDS 0.18 // Data in & DM set-up time (ns)
`define tDH 0.18 // Data in & DM hold time (ns)

//`define tDIPW 1.75 // Data in & DM input pulse width (ns)
//`define tDV 0.375 // Output DQS vaild window (tCK)

//`define tPDEX 8.75 // Power Down exit Time (ns)
//`define tXSA 20000 // Exit self refresh to bank active command (tCK)
`define tREF 7.8 // Refresh interval time (us)
`define tWPST 0.4 // DQS write postamble time (tCK)
`define tDAL 280 // Auto precharge write recovery + precharge time (ns)
`endif

/************************************************** ******************************/
/* -16 Specification */
/************************************************** ******************************/
`ifdef S16 // -16 spec
`define tGR 2 // Gapless 2 tCK
`define tRTW 12 // Read to Write at same bank - CL=8tCK, tCDLR=4tCK

`define tRC 27 // Row cycle time(min) - operation (tCK)
`define tRFC 34 // Row cycle time(min) - Auto Refresh (tCK)
`define tRASmin 19 // Row active minimum time (tCK)
`define tRASmax 100000 // Row active maximum time (tCK)
`define tRCDRD 9 // Ras to cas delay(min) for Read (tCK)
`define tRCDWR 5 // Ras to cas delay(min) for Write (tCK)
`define tRP 8 // Row precharge time(min) (tCK)
`define tRRD 7 // Row to row delay(min) (tCK)
`define tWR 5 // Last data in Row precharge (7 tCK)
`define tCDLR 4 // Last data in to Read delay (6 tCK)
`define tCDLW 0 // Last data in to Write delay (0 tCK)
`define tCCD 2 // Col. address to col. address delay (3 tCK)
`define tCKmin 1.6 // Clock minimum cycle time (ns) - CL=8
`define tCKmax 6 // Clock maximun cycle time (ns) - CL=8

`define tCC5 2.5 // Clock minimun cycle time at cas latency=5 (ns)
`define tCC6 2.22 // Clock minimun cycle time at cas latency=6 (ns)
`define tCC7 1.8 // Clock minimun cycle time at cas latency=7 (ns)
`define tCC8 1.6 // Clock minimun cycle time at cas latency=8 (ns)
`define tCC9 1.4 // Clock mimimum cycle time at cas latency=9 (ns)

`define tCHmin 0.45 // Clock high pulse width (min:0.45tCK, max:0.55tCK)
`define tCHmax 0.55 // Clock high pulse width (min:0.45tCK, max:0.55tCK)
`define tCLmin 0.45 // Clock low pulse width (min:0.45tCK, max:0.55tCK)
`define tCLmax 0.55 // Clock low pulse width (min:0.45tCK, max:0.55tCK)
`define tDQSCK 0.26 // DQS out edge to clock edge (min:-0.26, max:0.26)
`define tDQSQ 0.18 // Data strobe edge to output data edge (min:-0.16, max:+0.16)

//`define tSLZ 0.75 // DQS low-Z to vaild DQS delay @ Read Preamble (min:tCK-0.75, max:tCK+0.75)
//`define tSHZ 0.75 // Valid DQS to DQS Hi-Z delay @ Postamble (min:tCK/2+0.75, max:tCK/2+0.75)
//`define tHZQ 0.75 // Data out active to High-Z (min:tCK/2-0.75, max:tCK/2+0.75)
//`define tDQCK 0.16 // out data edge to clock edge (min:-0.16, max:0.16)
`define tDQSS1min 0.85 // Write command to first DQS latching transition - WL=1
`define tDQSS1max 1.15 // Write command to first DQS latching transition - WL=1
`define tDQSS2min 1.85 // Write command to first DQS latching transition - WL=2
`define tDQSS2max 2.15 // Write command to first DQS latching transition - WL=2
`define tDQSS3min 2.85 // Write command to first DQS latching transition - WL=3
`define tDQSS3max 3.15 // Write command to first DQS latching transition - WL=3
`define tDQSS4min 3.85 // Write command to first DQS latching transition - WL=4
`define tDQSS4max 4.15 // Write command to first DQS latching transition - WL=4
`define tDQSS5min 4.85 // Write command to first DQS latching transition - WL=5
`define tDQSS5max 5.15 // Write command to first DQS latching transition - WL=5

//`define tSDQS 0 // DQS-in setup time (ns)
//`define tWPREH 0.25 // DQS-in hold time (0.25 tCK)
//`define tSIHmin 0.45 // DQS-in high level width (0.45tCK)
//`define tSIHmax 0.55 // DQS-in high level width (0.55tCK)
//`define tSILmin 0.45 // DQS-in high level width (0.45tCK)
//`define tSILmax 0.55 // DQS-in high level width (0.55tCK)
//`define tSICmin 0.9 // DQS-in cycle time (0.9 tCK)
//`define tSICmax 1.1 // DQS-in cycle time (1.1 tCK)
`define tDSC 1 // DQS-in cycle time (1 tCK)

`define tIS 0.4 // Input setup time (ns)
`define tIH 0.4 // Input hold time (ns)
`define tMRD 5 // Mode register set cycle time (6tCK)
`define tDS 0.2 // Data in & DM set-up time (ns)
`define tDH 0.2 // Data in & DM hold time (ns)

//`define tDIPW 1.75 // Data in & DM input pulse width (ns)
//`define tDV 0.375 // Output DQS vaild window (tCK)

//`define tPDEX 10 // Power Down exit Time (ns)
//`define tXSA 20000 // Exit self refresh to bank active command (tCK)
`define tREF 7.8 // Refresh interval time (us)
`define tWPST 0.4 // DQS write postamble time (tCK)
`define tDAL 272 // Auto precharge write recovery + precharge time (ns)
`endif


/************************************************** ******************************/
/* -20 Specification */
/************************************************** ******************************/
`ifdef S20 // -20 spec

`define tGR 2 // Gapless 2 tCK
`define tWTR 10 // Read to Write at same bank - CL=7tCK, tCDLR=3tCK

`define tRC 21 // Row cycle time(min) - operation (tCK)
`define tRFC 27 // Row cycle time(min) - Auto Refresh (tCK)
`define tRASmin 15 // Row active minimum time (tCK)
`define tRASmax 100000 // Row active maximum time (tCK)
`define tRCDRD 7 // Ras to cas delay(min) for Read (tCK)
`define tRCDWR 4 // Ras to cas delay(min) for Write (tCK)
`define tRP 6 // Row precharge time(min) (tCK)
`define tRRD 5 // Row to row delay(min) (tCK)
`define tCDLR 3 // Last data in to Read delay (6 tCK)
`define tCDLW 0 // Last data in to Write delay (0 tCK)
`define tCCD 2 // Col. address to col. address delay (2 tCK)
`define tCKmin 2 // Clock minimum cycle time (ns) - CL=9
`define tCKmax 6 // Clock maximun cycle time (ns) - CL=9

`define tCC5 2.5 // Clock minimun cycle time at cas latency=5 (ns)
`define tCC6 2.22 // Clock minimun cycle time at cas latency=6 (ns)
`define tCC7 1.8 // Clock minimun cycle time at cas latency=7 (ns)
`define tCC8 1.6 // Clock minimun cycle time at cas latency=8 (ns)
`define tCC9 1.4 // Clock minimum cycle time at cas latency=9 (ns)

`define tCHmin 0.45 // Clock high pulse width (min:0.45tCK, max:0.55tCK)
`define tCHmax 0.55 // Clock high pulse width (min:0.45tCK, max:0.55tCK)
`define tCLmin 0.45 // Clock low pulse width (min:0.45tCK, max:0.55tCK)
`define tCLmax 0.55 // Clock low pulse width (min:0.45tCK, max:0.55tCK)
`define tDQSCK 0.26 // DQS out edge to clock edge (min:-0.26, max:0.26)
`define tDQSQ 0.16 // Data strobe edge to output data edge (min:-0.16, max:+0.16)

//`define tSLZ 0.75 // DQS low-Z to vaild DQS delay @ Read Preamble (min:tCK-0.75, max:tCK+0.75)
//`define tSHZ 0.75 // Valid DQS to DQS Hi-Z delay @ Postamble (min:tCK/2+0.75, max:tCK/2+0.75)
//`define tHZQ 0.75 // Data out active to High-Z (min:tCK/2-0.75, max:tCK/2+0.75)
//`define tDQCK 0.16 // out data edge to clock edge (min:-0.16, max:0.16)
`define tDQSS1min 0.85 // Write command to first DQS latching transition - WL=1
`define tDQSS1max 1.15 // Write command to first DQS latching transition - WL=1
`define tDQSS2min 1.85 // Write command to first DQS latching transition - WL=2
`define tDQSS2max 2.15 // Write command to first DQS latching transition - WL=2
`define tDQSS3min 2.85 // Write command to first DQS latching transition - WL=3
`define tDQSS3max 3.15 // Write command to first DQS latching transition - WL=3
`define tDQSS4min 3.85 // Write command to first DQS latching transition - WL=4
`define tDQSS4max 4.15 // Write command to first DQS latching transition - WL=4
`define tDQSS5min 4.85 // Write command to first DQS latching transition - WL=5
`define tDQSS5max 5.15 // Write command to first DQS latching transition - WL=5

//`define tSDQS 0 // DQS-in setup time (ns)
//`define tWPREH 0.25 // DQS-in hold time (0.25 tCK)
//`define tSIHmin 0.45 // DQS-in high level width (0.45tCK)
//`define tSIHmax 0.55 // DQS-in high level width (0.55tCK)
//`define tSILmin 0.45 // DQS-in high level width (0.45tCK)
//`define tSILmax 0.55 // DQS-in high level width (0.55tCK)
//`define tSICmin 0.9 // DQS-in cycle time (0.9 tCK)
//`define tSICmax 1.1 // DQS-in cycle time (1.1 tCK)
`define tDSC 1 // DQS-in cycle time (1 tCK)

`define tIS 0.5 // Input setup time (ns)
`define tIH 0.5 // Input hold time (ns)
`define tMRD 5 // Mode register set cycle time (6tCK)
`define tDS 0.25 // Data in & DM set-up time (ns)
`define tDH 0.25 // Data in & DM hold time (ns)

//`define tDIPW 1.75 // Data in & DM input pulse width (ns)
//`define tDV 0.375 // Output DQS vaild window (tCK)

//`define tPDEX 12.5 // Power Down exit Time (ns)
//`define tXSA 20000 // Exit self refresh to bank active command (tCK)
`define tREF 7.8 // Refresh interval time (us)
`define tWPST 0.4 // DQS write postamble time (tCK)
`define tDAL 260 // Auto precharge write recovery + precharge time (ns)
`endif

felinusz
07-31-05, 12:58 AM
Here are a few BIOSes. We've got my X700 Pro, an X700 XT, and a VIVO X800 Pro for ATi's GC20, and we've got a 6800GT, and a 6600GT as a sample of nVidia's GC20 timings.

felinusz
07-31-05, 01:05 AM
Of interest, are the ATi GC20 cards, which all use the same timings regardless of reference memory speed, which varies quite a lot.

The X800 Pro is referenced for 452 MHz, the X700 XT is referenced for 526 MHz, and the X700 Pro is referenced for 432 MHz. That tells me that the memory is being binned.

felinusz
07-31-05, 01:22 AM
And here are the timings across the GC20 ATi cards - all the same.

Sentential
07-31-05, 01:23 AM
Here's what ive pulled from samsung's tech sheet.

Samsung TRCRD = 7
ATi TRCRD = 7

Samsung TRCDWR = 4
ATi = 4

Samsung tRP = 6
ATi = 5

Samsung tRAS = 15
ATi = 14

Samsung TRRD = 5
ATi = 5

Samsung TWR = 10
ATi = 7

Samsung TR2W = ?
ATi = CL + 3

Samsung TW2R = 4
ATi = 3

Samsung TR2R = 5?
ATi = 2

Samsung WR latency = ?
ATi = 1.5

Samsung Cas latency = 8
ATi = 7

Samsung CMD latency = ?
ATi = 0

Samsung STR latency = ?
ATi =

Samsung WR latency = ?
ATi = 1.5

Samsung TRFC = 27
ATi = 27

Sentential
07-31-05, 02:06 AM
Well Ian and I have created a ******* GC14 BIOS for the x700PRO. He's flashing it now. God willing this should be interesting stuff.

RedDragonXXX
07-31-05, 02:22 AM
I need to start working on my 6800U!

maxxoverclocker
07-31-05, 03:47 AM
AHHH SO MUCH DATA!!! lol thx for testing all this guys. im just wondering but what do you mean that ati clocks poorly? i mean maybe ive never gotten a bad card before but my 9800pro stock was 325 and i got it to 384mhz, and on my x800xl stock was 490 and im running 561 on essentialy stock cooling

edit: also i read somewhere that the memory on ati cards is pretty much starving for voltage- is there a way in increase the voltage through bios? i know nvidia can but can ati? thx

ScottinIndy
07-31-05, 04:39 AM
Well Ian and I have created a ******* GC14 BIOS for the x700PRO. He's flashing it now. God willing this should be interesting stuff.

Excellent thread! :thup:

felinusz
07-31-05, 05:07 AM
I tried out a BIOS adjusted with laxer timings, to see if looser settings would allow the memory to scale further. They did not. However, due to the limitations of Rabit, I couldn't loosen CAS past reference.

I also tried a BIOS adjusted with slightly tigher timings - all the main values (except CAS) brought down a notch. These settings run exactly as stable as ATi's reference - theoretically the performance should be better, with tighter timings. If I have time tommorrow I'll try tightening things a bit more, and take a look at exactly what kind of gain is to be had through timing adjustment.


It's interesting that ATi uses timings tighter than Samsung's reccomendation, but we have yet to see if alteration can achieve anything signifigant.

Sentential
07-31-05, 01:22 PM
It's interesting that ATi uses timings tighter than Samsung's reccomendation, but we have yet to see if alteration can achieve anything signifigant.
Very true, however this could prove beneficial now that we know what to change and understand how Samsung's DDR3 behaves.

Flip-Mode
07-31-05, 02:43 PM
Any chance you can do a BIOS volt mod? I did it for my old 6800NU and it worked quite well, what ATI?

felinusz
07-31-05, 03:01 PM
No, ATi does not have any form of soft voltage control - the voltages cannot be manipulated through the video card's BIOS. Physical modifcation (pencil, resistor) is the only way to adjust voltages with an ATi video card.

ati
07-31-05, 03:24 PM
This is interesting stuff...*subscribed*

Sentential
07-31-05, 04:09 PM
No, ATi does not have any form of soft voltage control - the voltages cannot be manipulated through the video card's BIOS. Physical modifcation (pencil, resistor) is the only way to adjust voltages with an ATi video card.
Wherez doz benches!

Enablingwolf
07-31-05, 04:23 PM
Subscribed, and thank you for a highly interesting thread fellas. Hope the mystery is solved. Look forward to positive results. :thup:

corruption
07-31-05, 07:01 PM
If you guys need/want it, I can send you a copy of my x800xl (agp) bios. If you prefer, I can post screenshots of RAM timings. :beer:

felinusz
07-31-05, 07:21 PM
That would be interesting - I'm pretty sure the timings would be the same as the other GC20 BIOSes (X800 XL reference is also GC20).

I ran some quick benchmarks with the tightened set of timings. The difference in 3DMark2005 was quite marginal (practically within margin of error) - about ~10 marks at the exact same clock frequencies. Tightening CAS past the reference of 7 resulted in full-screen artifacting.

maxxoverclocker
07-31-05, 07:32 PM
one thing i noticed on my radeon 9800 pro was that if you adjusted the 'TRFC' from the stock 20? i think down to 15 it would get me about 200 more 3dmarks and antialiasing abilitys went up like crazy (14 would go CRAZY as far as artifacting)

i attached the bios rom files for all the cards i've had in case you would like to test them (x800xl, 9800pro, 9800xt)

Sentential
08-01-05, 09:14 PM
If you guys need/want it, I can send you a copy of my x800xl (agp) bios. If you prefer, I can post screenshots of RAM timings. :beer:
Sure go for it

speed bump
08-02-05, 11:06 AM
Stupid random question. What all has GC20 memory on it?

It appears the x800xl, x800pro, x700 top two models have this memory. However I think craig588 said the x800pro had tighter timings than the x800xt and I know the x850s have looser timings than the x800s and also happen to clock like crazy so which memory do they use.

Great results though.

mikeguava
08-03-05, 05:36 AM
Stupid random question. What all has GC20 memory on it?

It appears the x800xl, x800pro, x700 top two models have this memory. However I think craig588 said the x800pro had tighter timings than the x800xt and I know the x850s have looser timings than the x800s and also happen to clock like crazy so which memory do they use.

Great results though.


The X850 is slightly loser in MEMTRP (6 vs 5 ) and the MEM_REFRESH_RATE is 0x20 vs 0x37 of the X800.

Well I believe that actually that depends on the BIOS of the X850 - there might be some bios out that have the identical settings....turns out the X800 is barely tighter.

mikeguava
08-03-05, 05:43 AM
I tried out a BIOS adjusted with laxer timings, to see if looser settings would allow the memory to scale further. They did not. However, due to the limitations of Rabit, I couldn't loosen CAS past reference.

I also tried a BIOS adjusted with slightly tigher timings - all the main values (except CAS) brought down a notch. These settings run exactly as stable as ATi's reference - theoretically the performance should be better, with tighter timings. If I have time tommorrow I'll try tightening things a bit more, and take a look at exactly what kind of gain is to be had through timing adjustment.


It's interesting that ATi uses timings tighter than Samsung's reccomendation, but we have yet to see if alteration can achieve anything signifigant.

You will absolutely see a difference - CAS seems to be set in stone. From my own series of tests they can be very significant... around 35mhz worth in mem frequency if timings are tightened across the board...but you have to push the entire card to the max to where you really start seeing the differences I imagine

Quick question - what made you decide to change the timings in the Bios? Any particular reason? I read that in some cases overclocks are more stable in NVidia's case when done in bios - but I don't think that this is the case with ATI.

"Settings running exactly as stable" I think in this case you should be first pushing the memory to the max before applying tigher settings to be sure about this :) tigher settings will artifact much earlier -even with CAS remaining.

Another interesting thing that I personally experienced-that the difference between same type of cards cards can be significant on how they allow tighter settings. Something to keep an open mind to.

GC20 binning - not too sure about that one - e.g. even while the GC20 on the X800Pro is rated for only 451mhz I have not yet personally seen any X800PRO that will do below 550mhz on the mem with stock voltages. I think the mhz is more card positioning decided by the marketing department of ATI. Binning will certainly happens in a lot of cases - but I believe the core is the main deciding factor.

mikeguava
08-03-05, 05:52 AM
Felinuz and I have been discussing this for quite some time and we have yet to figure out why ever since the 9800 days ATi's ram has done so peformed so poorly. Often timings not even peforming stock speeds.

Its been an unofficial belief among all of us here that ATi uses tighter timings than what nVidia does. Tonight we are going to try and put this to rest.

Felinuz is manually checking his timings against 6600GTs and 6800GTs (nV cards with GC20) in hopes to find out the anwser to this riddle.

For the time being, would anyone like to comment?

How are you planning to compare NVidia's Mem to ATI's memory? There is just a similar Q about this in the ATI forum - wouldn't know how to compare it without bringing the core into the equation.
Certainly voltages etc. should be set to equal levels but not sure if there is a bench that will stress only memory - but I remember seeing some videomemory tests from Futuremark...

What does the ATI's memory clocking badly refer to? Obviously a stock X850XT PE has higher mem clockspeeds than the corresponding 6800Ultra and overclocks of the ATI cards have been slightly higher with the highest mem OCs than any 6800 that I could find off-hand on the ORB

Great effort guys for looking into this - can't wait to read some updates!

caincha
08-03-05, 02:03 PM
one thing i noticed on my radeon 9800 pro was that if you adjusted the 'TRFC' from the stock 20? i think down to 15 it would get me about 200 more 3dmarks and antialiasing abilitys went up like crazy (14 would go CRAZY as far as artifacting)

i attached the bios rom files for all the cards i've had in case you would like to test them (x800xl, 9800pro, 9800xt)
Subscribed!
I'm at work right now but will try this later at home.
Thanks :D

felinusz
08-03-05, 05:18 PM
mikeguava

You will absolutely see a difference - CAS seems to be set in stone. From my own series of tests they can be very significant... around 35mhz worth in mem frequency if timings are tightened across the board...but you have to push the entire card to the max to where you really start seeing the differences I imagine

Quick question - what made you decide to change the timings in the Bios? Any particular reason? I read that in some cases overclocks are more stable in NVidia's case when done in bios - but I don't think that this is the case with ATI.

"Settings running exactly as stable" I think in this case you should be first pushing the memory to the max before applying tigher settings to be sure about this tigher settings will artifact much earlier -even with CAS remaining.

Another interesting thing that I personally experienced-that the difference between same type of cards cards can be significant on how they allow tighter settings. Something to keep an open mind to.

GC20 binning - not too sure about that one - e.g. even while the GC20 on the X800Pro is rated for only 451mhz I have not yet personally seen any X800PRO that will do below 550mhz on the mem with stock voltages. I think the mhz is more card positioning decided by the marketing department of ATI. Binning will certainly happens in a lot of cases - but I believe the core is the main deciding factor.

Interesting thoughts - the core, and video card PCB *should* definitely have an impact on results as well, especialy with GDDR3 results, which seem to heavily depend on memory/core buffer voltages (and probably the memory/core interface as well).

My card *is* pretty much maxed out at ~555 MHz, unfortunately. A large overvolt across VDDR and VDDQ netted almost nothing in the way of MHz gains for my RAM - which leaves gains through timings.

I find it quite peculiar that some of this memory gains with voltage, while other examples do not.

caincha
08-03-05, 07:51 PM
one thing i noticed on my radeon 9800 pro was that if you adjusted the 'TRFC' from the stock 20? i think down to 15 it would get me about 200 more 3dmarks and antialiasing abilitys went up like crazy (14 would go CRAZY as far as artifacting)

i attached the bios rom files for all the cards i've had in case you would like to test them (x800xl, 9800pro, 9800xt)
Nope!
Both 9800pro and 9800XT gave my card major artifacts...
So, back to my "regular" XT bios

mikeguava
08-03-05, 08:51 PM
Interesting thoughts - the core, and video card PCB *should* definitely have an impact on results as well, especialy with GDDR3 results, which seem to heavily depend on memory/core buffer voltages (and probably the memory/core interface as well).

My card *is* pretty much maxed out at ~555 MHz, unfortunately. A large overvolt across VDDR and VDDQ netted almost nothing in the way of MHz gains for my RAM - which leaves gains through timings.

I find it quite peculiar that some of this memory gains with voltage, while other examples do not.

Tight and loose settings:

http://service.futuremark.com/compare?3dm05=1064397

http://service.futuremark.com/compare?3dm05=1053304

With this card that also has the GC20 I have to clock down on the mem to run on tigher settings ( if I remember correctly from a max 675mhz down to around 650) while on my X850XT I don't have to... luck of the draw.

Tighter settings are good for 60-300points depending on how far you go and which card you use

I honestly haven't spent any quality time on the X700Pro yet - hopefully going to next week but not sure what is holding back the memory so much if you are fully vmodded on the VDD and VDDQ - how far did you go voltage wise? 555 is certainly not the end of the road for GC20 - must be something holding it back on the card.

EDIT just realized that the GC20 on the X700 is only 128bits - so your performance should be minty...the 2 615mhz scores on the ORB must be under phasechange?

Yuriman
08-03-05, 11:28 PM
I just skimmed this thread, but I am very interested in this. My X700pro's ram (128mb) does nearly 600mhz, but I have had bad luck with overclocking video cards so I leave it at stock :p. Since it has so much headroom, I should be able to do quite a bit with the timings, eh? I'll play around with them a bit tomorrow.

Btw here is my bios, for those interested. http://home.earthlink.net/~eckyx3/X700pro%20128mb.bin

EDIT: I can get my TRFC down to 14, from a stock of 27. I'll test out some 3dmark with and without AA.

felinusz
08-04-05, 12:03 AM
Mikeguava, I've gone as high as 2.35V for VDDQ/VDD, only saw 5 Mhz in gains :-/.

The two 615 MHz memory results on the ORB were done with X700s with GC16 - some X700 Pro cards come with GC16 (only 128 meg cards?) and scale much higher memory-wise.


Yuriman, I'm going to check out your 128 card's BIOS right now :)

EDIT - it has the same timings as mine.
Just flashed it, after flashing it Windows redetected my card, and I had to reinstall device drivers... works and overclocks the same as my stock 256 BIOS however, and the card's still detected as being a 256.

Yuriman
08-04-05, 01:02 AM
Sometimes my card is detected as a 256mb card as well. When I first installed it, with Cat 5.whatever, it was detected correctly, but Ati apparently unfixed it. Anywho I ran some tests, and in 3dmark03, with no AA, I get exactly the same with TRFC at 27 and 14. I was rather suprised... the score was exactly the same. With 4x AA, I gained 26 points by going to 14. That equates to about a 0.67% increase. I'm going to bed, tomorrow I will mess with more timings.

mikeguava
08-04-05, 03:48 AM
I actually did realize the 128mb vs 256mb difference after I posted - I guess this could be another example where slapping on extra memory causes potential performance losess and overclocking potential just like in lots of other cases. I have a feeling it is not an GC16 vs GC20 issue.

I guess one mystery is partly lifting.

You really went all the way with your voltages - too bad was hoping that there'd be some more room. Did sure you more and less VDDQ vs VDD which can vary either way randomly.

I am expecting a X700Pro next week with 128MB be glad to look into any Qs you might have for that one

felinusz
08-04-05, 11:56 AM
Yeah, I tried running both VDDQ and VDD hotter than the other to varying dergees (0.02V-0.06V), and also tried running them the same. The chips have really big copper heatsinks on them, with lots of airflow.

I'm pretty sure the GC16 has some impact, as most of the cards with GC16 scale way higher.

mikeguava
08-04-05, 01:35 PM
Yeah, I tried running both VDDQ and VDD hotter than the other to varying dergees (0.02V-0.06V), and also tried running them the same. The chips have really big copper heatsinks on them, with lots of airflow.

I'm pretty sure the GC16 has some impact, as most of the cards with GC16 scale way higher.

wasn't too sure whether there really were any GC16 out there on X700 - but you're right - I just saw a cpl of blips here and there too.

Well if there are 1.6ns out that also means that those cards mem is 256bit vs the 128bit of your GC20.

Like I said before - I missed out that the X700 Pro is running 128bits GDDR2 memory and not the GDDR3 I thought it was -
You got an 10% overclock out of it which seems to be the max it can go.

At least we can conclude for now that there isn't a mystery - at least in the case of the X700PRO - that ATI is not ineffienctly using memory - they are just using inefficient memory :santa:

felinusz
08-04-05, 02:27 PM
mikeguava

Well if there are 1.6ns out that also means that those cards mem is 256bit vs the 128bit of your GC20.

Like I said before - I missed out that the X700 Pro is running 128bits GDDR2 memory and not the GDDR3 I thought it was -
You got an 10% overclock out of it which seems to be the max it can go.

At least we can conclude for now that there isn't a mystery - at least in the case of the X700PRO - that ATI is not ineffienctly using memory - they are just using inefficient memory

The card uses a 128-bit GDDR3 memory interface and uses 2.0ns GC20 memory made by Samsung - same RAM as used with nVidia's lower-end GDDR3 cards, and the same RAM ATi uses with the lower-end X800 series cards. The memory interface is different, the RAM is the same - it shouldn't be limiting clock frequency. I don't know if GCxx should really be declared innefficient, just yet...

The RV410 core was designed with a 128-bit memory interface; the type of memory on the card has no effect on that, the cards with GC16 still operate with a 128-bit GDDR3 memory interface.

felinusz
08-04-05, 02:43 PM
For everyone's reference, here are Samsung's detailed specifications for GCxx GDDR3 memory - http://www.samsung.com/Products/Semiconductor/GraphicsMemory/GDDR3SDRAM/256Mbit/K4J55323QF/K4J55323QF.htm

On that page, you can download a detailed specification sheet (PDF).

I was reading through it, and am wondering what the following from page 12 implies? It imples that a looser CAS timing is required for higher frequency operation? Loosening CAS to 8/9 with ATiTool does not, unfortunately, net my memory any noticeable frequency gains.

K4J55323QF-GC (GCxx) Reference (http://www.samsung.com/Products/Semiconductor/GraphicsMemory/GDDR3SDRAM/256Mbit/K4J55323QF/K4J55323QF.htm)

Below table indicates the operating frequencies at which each CAS latency setting can be used. Reserved states should not be used as unknown operation or incompatibility with future versions may result.

mikeguava
08-04-05, 02:53 PM
lol - I thought is was the k4N on the card because of the 128bit - guess I'll have to wait for mine to arrive which also has the GC20 :-( before I run too many assumptions...

mikeguava
08-11-05, 06:15 PM
well I played with both the 256MB and 128MB - both 2.0ns While the 256 wouldn't oc for **** - the 128mb easily went up to 606MHZ on the mem. Vmodding got me hardly nothing really only less artifacts.

Changing memory timings gained 100points in 3DMark05.

So one of my assunptions appears to be correct - the doubling the memory is the culprit in performance. Mayve the topic should be changed to why X700Pro 256MB clocks so poorly?

I just modded a X850XT which does 710mhz on air in 3DMark05 - I have not seen a Nvidia card with 1.6ns do that yet.

DaWiper
08-11-05, 06:38 PM
Here's an explanation on scart:
http://en.wikipedia.org/wiki/SCART

EDIT: Huh? How did this end up here? Sorry about that.

felinusz
08-12-05, 11:48 AM
mikeguava

well I played with both the 256MB and 128MB - both 2.0ns While the 256 wouldn't oc for **** - the 128mb easily went up to 606MHZ on the mem. Vmodding got me hardly nothing really only less artifacts.

Changing memory timings gained 100points in 3DMark05.

So one of my assunptions appears to be correct - the doubling the memory is the culprit in performance. Mayve the topic should be changed to why X700Pro 256MB clocks so poorly?

I just modded a X850XT which does 710mhz on air in 3DMark05 - I have not seen a Nvidia card with 1.6ns do that yet.


Interesting, looks like the 256 Meg cards have issues with high memory overclocks independantly of the memory IC type. Perhaps the core is not well designed to interface with that much memory? I'm going to try and find a used 128 Meg X700 Pro to mess around with... my curiosity has been piqued.

A 100 Point gain in '05 is pretty signifigant - specifically which timings did you adjust? My adjustments along Samsung's references did not get me much of an increase - barely outside margin of error.