I’m an avid SETI@home cruncher have been for years. My bragging note is that I recently passed 3,000,000 credits on SETI@home and part of that is thanks to the SETI@home CUDA enabled client they released which enables GPU crunching of SETI@home work units!
I’m currently running the CUDA client on two machines that have video cards that support the CUDA API. My home GeForce 8800GTX and my work Quadro FX 4600. From what I’ve read in specs and reviews, they’re basically the same card. So although this issue isn’t TECHNICALLY apples to apples, there might be a connection.
I noticed after a month or so of running the CUDA client on my home GeForce 8800GTX, my Vista machine started to become unstable. Blue Screening (BSOD) or rebooting randomly. It struck me as odd because I had not changed anything and for the most part, the system had just been sitting idle crunching work units.
After some basic trouble shooting I was able to determine that the instability was due to my video card overheating! This struck me as strange, because I know nVidia uses variable speed fans on the 8800GTX, so you’d think that if the system was reaching an unsafe operating level, that the fan would kick on, right? Wrong.
I ended up downloading RivaTuner which allows you to access low level functions (such as overclocking, and FAN CONTROL). Using RivaTuner, I manually setting my GPU fan to 100%. This keeps the average temp of my 8800GTX around 67-70C.
Which brings us to today when my work machine started showing the same symptoms. When I booted my machine, within 5 minutes of being on (with SETI@home running), it gave me a BSOD (with dxgkrnl.sys as the culprit). The system wouldn’t boot for a few minutes (returning BIOS beep codes) until the Quadro FX 4600 cooled down and the system booted again.
This time I disabled SETI@home as soon as I logged in and the system appeared stable. I installed SpeedFan to double check I was seeing the same issue. Low and behold, I start SETI@home and my GPU temp almost gets to 80 degrees Celsius before my system blue screens and reboots.
Repeating the same steps as with my home system, I use RivaTuner to force the Quadro FX 4600 to always keep the fan speed at 100% and this seems to fix the issue. My GPU temp currently sits at 67 degrees Celsius and my system appears to be stable (for the time being
)
I’m curious if anyone else out there has had the same heat related issues when running CUDA applications on nVidia video cards. It seems that their automatic fan controls have logic issues when not running a 3D game (because when running one of those, it’ll hit leaf blower speeds).
It would seem to me that although the video cards had no issue keeping up with cooling over a short period of time, but over an extended period of time with the GPU being used 24-hours a day, the cooling solutions appears to fall behind. Is this caused by a physical change due to the constant heat/stress? Who knows!
Cheers!








#1 by Kris on January 9, 2010 - 2:28 PM
Also Something to note: what ‘used’ to be considered ‘hot’ isnt necessarily anymore. My Nvidia 8800 GT GPU’s go into ‘overheating shutdown’ at 105 degrees Centigrade, and the fans are not even fully on until about 85 degrees.