PSA: NVIDIA RTX 3080 / 3090 (Stability/Issues Discussion)
Driver Update:
NVIDIA release yet another hotfix patch 461.40, addressing the original issues that was in 461.33 including SteamVR issues. In the original hotfix that was quickly pulled by NVIDIA, many fixes that NVIDA claims to have fixed it wasn’t actually fix at all.
https://www.nvidia.com/content/DriverDo ... h-whql.exe
[X4: Foundations][Vulkan]: The game may crash on GeForce RTX 30 series GPUs.
[X4: Foundations][Vulkan]: HUD in the game is broken.
[Resident Evil 2 Remake/Devil May Cry V] Games which used the RE2 engine may crash in
DirectX 11 mode
[DaVinci Resolve]: Error 707, application crash, or application instability may occur.
[Adobe Premiere Pro]: The application may freeze when using Mercury Playback Engine GPU
Acceleration (CUDA).
[Zoom][NVENC]: Webcam video image colors on the receiving end of Zoom may appear
incorrect.
[Detroit: Become Human]: The game randomly crashes.
[Steam VR game]: Stuttering and lagging occur upon launching a game (without running
running any GPU hardware monitoring tool in the background)
[Assassin's Creed Valhalla]: The game may randomly crash after extended gameplay
NVIDIA Broadcast Camera filter may hang.
[Zoom]: Chrome browser flickers with Zoom app.
[G-SYNC][Surround][RTX 30 series] PC may restart when enabling NVIDIA Surround with GSYNC
enabled on RTX 30 series GPUs.
**The acknowledged issues of screen flickering and tearing in Chrome/Edge/MS Office etc is still not fixed**
Hopefully the new 461.40 hotfix drivers actually correct some of the previously address issues.
I’m testing it extensively on my 3090 over the next few days.
****UPDATE #3****
Looks like NVIDIA pulled the driver 461.33 Hotfix patch update, due to lkely more bugs, and "addressed issues that were fix" were actually fix... The NVIDIA GeForce and Studio drivers pre 457.xx were actually less buggy and more stable... Probably for the time being for current owners of RTX 3080 and RTX 3090 cards to roll back to a older driver version (Late Sept- Mid-Oct) using DDU.
****UPDATE #2****
NVIDIA finally recognize the problem for this VIDEO_TDR Failure when the Ampere GPUs are in 2D mode (Web Browser/Discord etc). The suggested solution before the new driver fix comes out is to disable “Hardware Acceleration” in your browsers and your applications (if any).
Enough users had submitted their feedback in regards to the issue, (myself included). Seems like this TDR issue has been making Ampere GPUS blackscreening and BSoD for many months, and only now has NVIDIA finally acknowledging the problem with they crap drivers.
The official knowledge base # for the 2D TDR Failure issue:
[Ampere] Chrome/Edge may experienece random TDR while browsing [3195894]
I really hope they would fix this in the next driver update.. Based on NVIDIA issue number, this TDR issue only affects Ampere based GPUs, whether it affect the 3080 and 3090 is anyone’s guess, but from what I’ve seen reported, more 3080 and 3090s are affected.
This issue should be fix in a future driver update for users experiencing this issue.
**UPDATE:**
Seems like the latest version of the driver has more or less fixed the issue from the testing I’ve done in the past couple of days.
I will report back if problem comes back in the long-term.
I hate to be the bearer of bad news but...
Just bought a RTX 3090 EVGA FTW3 Ultra as well as borrowed a friend’s RTX 3080 MSI Suprim. The system used is a i9-10900K, 32GB DDR4 RAM, Z390 Aorus Pro
Both cards has stability problems, as in blacking out your screen for 10-15 seconds, and sometime you’ll get your desktop back with the message “NVIDIA Driver has recovered” message, and other times it will Blue Screen of Death on you with the error “VIDEO_TDR_FAILURE 0x00000116 (**0x116**)” and the system will reboot or simply soft reboot without showing a blue screen , on the silliest thing I’ve ever seen, browsing the internet, and watching videos on YouTube, Netflix and other 2D applications (including game trailers in Epic Game Store and Steam)...
It seems like the 3080 and 3090 GPUs drop both core clocks and memory clocks down to 0 MHz during 2D applications that is causing this, and a temporary fix for this is to disable hardware acceleration for the browser and certain other 2D video applications...
In 3D workloads (which including GPU Crypto Mining) where the GPU is using 3D clocks (anything over 800 MHz core), the GPUs are dead stable. 2D core clocks are usual around 210 MHz to 300 MHz range, while the memory hovers around the similar clocks below 500 MHz.
It seems like this is a quite common issue with the 3080 and 3090 cards, from numerous manufacturers. I tried the previous version NVIDIA Drivers 460.79, 460.89, as well as the newly release 461.09, and even flashed the latest BIOS on both graphics cards, and DDU the drivers with reinstalls, as well as Windows 10 fresh installs, and the issue still persists.
**According to the latest driver (version 461.09 release notes, NVIDIA addressed the 3x4K monitor resume from sleep BSoD triggering the same Video TDR Failure 0x116) Check the release note PDF here on page 12:
https://us.download.nvidia.com/Windows/ ... -notes.pdf
Microsoft’s explanation for VIDEO_TDR_FAILURE (0x116)
https://docs.microsoft.com/en-us/window ... dr-failure
Microsoft suggest 0x116 VIDEO_TDR_Failure is cause due to the graphics card not able to update the screen after a given time ** TDR Timer Delay in the registry**. Which makes a lot of sense as the black screening in 2D applications happens mostly when the GPU core clocks drops to 0 MHz instead of the 210 MHz - 300ish MHz normal 2D core clocks. And Microsoft suggest it may be a hardware issue, but that might not be the case with these graphics cards as NVIDIA themselves address the TDR Failure issue with the 461.09 driver update (well for resume from sleep that is). Based on current drivers, who knows how many other bugs are there, recent drivers has been a crap shoot since September.. Though there are reports that late September and early October NVIDIA drivers have less 0x116 black screens crashes, but concurrently have other issues CTD in some 3D application etc.
And yes, I did use 3 separate PCI-e power cables into the cards, and my power supply is a Seasonic Prime 1300W Gold (recently purchased less than 2 months ago), a pretty good power supply unit with low voltage ripple.
I’ve also tested with the same drivers and same OS install on the same computer system my trusty GTX 1080 Ti and RTX 2070 Super, and even my old GTX 970, none of them exhibit this behaviour. From a search on the internet, this TDR black screen that may recover or BSoD is quite common (check Reddit, EVGA Forums, and even NVIDIA’s own forums), and it seems to affect Founder’s Edition, ASUS, MSI, EVGA, Gainward, Gigabyte and other AIB vendors’ cards.
From what I’ve research, this could be a voltage load balancing issue when the GPU is at the P0 power state when idle, and some reported that changing PCI-e Power management and NVIDIA power preference to Maximize Performance can reduce the driver crashing/blacking out of the screen/BSoD, which also prevents the GPU from dropping its core clocks to 0 MHz. Though, I have tried this on both cards, and it does improve the stability by a huge huge margin, but at the expense of your GPU idle power around 100-130W and subsequently idle temps as well. The FTW3 RTX 3090 generally idles around 40-50W, and in my case/fan config it hovers around 37-40 degrees Celsius at room temperature air.
There is also the other “Black Screen of Death” that affect these 3080 and 3090 cards, which I personally have not experienced after days of 10+ hours of stress testing/benchmarking loops of 3DMark Time Spy Extreme and Port Royal. The Black Screen of Death is the one where the PCI-e Power connector LED will light up, and the computer will no longer recognize the graphic card, effectively “bricking” the video card (some users reported over on EVGA forums electricity arcs can be seen on the PCB followed by a “pop” sound that permanently break the cards, which make me to believe those with traditional (cylindrical) caps are more prone to this than the polymer (square/rectangular) caps used on some cards such as FE, TUF/STRIX etc.)
I still don’t know if this is a software related bug that is causing the GPU to drop clocks or is it a hardware power balancing or power delivery issue though. NVIDIA’s Ampere architecture in general is pretty bizarre, with crazy spikes of power with no absolute reason, same clocks, but random power spikes, and many people are reporting the same and were told is normal.
At this point, I’d say considering the prices of the RTX 30-series graphics card, I’d wait until NVIDIA and board partners iron out the bugs and issues... Don’t buy one even if you find a 3080 or whatever in stock. The stability and random BSoD just isn’t worth it, unless you have a backup machine to do your mission critical work from home things on, and have the machine with the RTX 30-series for high workload (3D) and such. With the current idle black screens with my result in BSoD, it is not worth your time and money to have this in you main machine at this time.
For now, I am stuck with the card for the time being as Canada Computers will not take returns on Graphics Card (possibly due the the high demand). Canada Computer told me that all GPUs are final sale, unless defective, and in which case, they will only exchange for the same graphics card, or the same series of graphics card of a different make.
If you do wish to pay these crazy prices for a 3080/3090 and become a free beta tester for NVIDIA, please buy it from a store such as Amazon that has a no question asked return policy, as you may experience constant stability issues at this time, based on a sample size of 2 ofc.
There may be other stability/crashing issues as well, I’m still in the process of testing. Hopefully these issues may be fixed in the near future, such as CTD in the early days.
If anybody has issues or if your card is working perfectly with their 30-series GPUs, please report them below, I would love to hear back from actual users.
It’s funny that none of the reviews I’ve read before I bought my 30-series GPU even mentioned these issues, maybe they got cherry picked or stability verified cards, idk . Not sure what the deal is there. But for my intended purposes of Architectural Visualization and VR Product Demos, this card is amazing, and a huge uplift in performance compared to my 2070 Super. But the stability is what really kills it for me at the moment. If I could return the card, I would. At this current time for the premium we are paying, I don’t think the 3080 and specifically the 3090 is even worth it.
Examples:
EVGA models:
https://forums.evga.com/RTX-30803090-Bl ... n-m3137072.
https://forums.evga.com/m/tm.aspx?m=3159640&p=1
Founders Edition:
https://www.nvidia.com/en-us/geforce/fo ... -rtx-3090/
https://www.reddit.com/r/techsupport/co ... n_desktop/
Gigabyte/ASUS (though these guys are seeing crashing in only certain 3D titles, probably a load balancing issue/driver):
https://www.nvidia.com/en-us/geforce/fo ... -game-aud/
https://www.nvidia.com/en-us/geforce/fo ... n-fan-100/
Basically the lists goes on..
It does seem to be 2 different types of black screen crashing that doesn’t brick cards. The ones that crashes in browser and/or desktop are the ones experiencing the 2D mode crashes. The ones crashing in certain video game titles seem to crash if the game is light on workload (GPU usage is low due to not demanding game and/or VSync is on, with a 3D clock of around 800 MHz to 1250 MHz). Games that fully load the GPU like Cyberpunk etc. and crypto mining that has a high GPU load doesn’t seem to have issues for the most part.
Of course, like in the OP, the black screen that lead to LED on the PCI-e power connector and a card isn’t recognized by the motherboard is a whole different issue all together, which usually can only be resolved with a new replacement card.
This is why I am suspecting it may have to do the PCB load balancing, and the driver still isn’t good to compensate for the variances in PCB design of different vendors and model makes.
These are just my speculations. So far, no manufacturer has come out and addressed the issue. NVIDIA on the other hand did have a few “Fixed Black Screen” for the 3080/3090 in their driver update change logs before, but the issue still persist.
According to people who have RMA’d their EVGA cards, it seems to be a very quick no questioned asked RMA for EVGA 3080/3090 XC3, FTW3 Ultra/Hybrid, KingPin cards. Not sure is this is just good service on EVGA’s part, or they might have an internal acknowledgement of the issue, still no public address on the matter from any manufacturers though.
Basically the TL:DR is that if your 30-series cards works fine as is in 2D and 3D mode, then I’m happy for you. Like I mentioned before, I don’t believe the 2D black screen crashing/soft reboots/BSoD are related to a hardware issue, but rather software (driver or GPU BIOS/NVFlash). (**unless you get the black screen with 100% fan on one of the card’s fans, and after reboot you get LEDs lit by PCIe power connectors, and motherboard/BIOS doesn’t recognize the GPU, then this is a HW issue and need a RMA replacement)**. Just make sure you don’t update the drivers if your current drivers that is stable for you. Updating the drivers can sometimes result in issues, and NVIDIA’s recent driver update track record is fixing an old bug, and introduce a couple new bugs along with it.
NVIDIA release yet another hotfix patch 461.40, addressing the original issues that was in 461.33 including SteamVR issues. In the original hotfix that was quickly pulled by NVIDIA, many fixes that NVIDA claims to have fixed it wasn’t actually fix at all.
https://www.nvidia.com/content/DriverDo ... h-whql.exe
[X4: Foundations][Vulkan]: The game may crash on GeForce RTX 30 series GPUs.
[X4: Foundations][Vulkan]: HUD in the game is broken.
[Resident Evil 2 Remake/Devil May Cry V] Games which used the RE2 engine may crash in
DirectX 11 mode
[DaVinci Resolve]: Error 707, application crash, or application instability may occur.
[Adobe Premiere Pro]: The application may freeze when using Mercury Playback Engine GPU
Acceleration (CUDA).
[Zoom][NVENC]: Webcam video image colors on the receiving end of Zoom may appear
incorrect.
[Detroit: Become Human]: The game randomly crashes.
[Steam VR game]: Stuttering and lagging occur upon launching a game (without running
running any GPU hardware monitoring tool in the background)
[Assassin's Creed Valhalla]: The game may randomly crash after extended gameplay
NVIDIA Broadcast Camera filter may hang.
[Zoom]: Chrome browser flickers with Zoom app.
[G-SYNC][Surround][RTX 30 series] PC may restart when enabling NVIDIA Surround with GSYNC
enabled on RTX 30 series GPUs.
**The acknowledged issues of screen flickering and tearing in Chrome/Edge/MS Office etc is still not fixed**
Hopefully the new 461.40 hotfix drivers actually correct some of the previously address issues.
I’m testing it extensively on my 3090 over the next few days.
****UPDATE #3****
Looks like NVIDIA pulled the driver 461.33 Hotfix patch update, due to lkely more bugs, and "addressed issues that were fix" were actually fix... The NVIDIA GeForce and Studio drivers pre 457.xx were actually less buggy and more stable... Probably for the time being for current owners of RTX 3080 and RTX 3090 cards to roll back to a older driver version (Late Sept- Mid-Oct) using DDU.
****UPDATE #2****
NVIDIA finally recognize the problem for this VIDEO_TDR Failure when the Ampere GPUs are in 2D mode (Web Browser/Discord etc). The suggested solution before the new driver fix comes out is to disable “Hardware Acceleration” in your browsers and your applications (if any).
Enough users had submitted their feedback in regards to the issue, (myself included). Seems like this TDR issue has been making Ampere GPUS blackscreening and BSoD for many months, and only now has NVIDIA finally acknowledging the problem with they crap drivers.
The official knowledge base # for the 2D TDR Failure issue:
[Ampere] Chrome/Edge may experienece random TDR while browsing [3195894]
I really hope they would fix this in the next driver update.. Based on NVIDIA issue number, this TDR issue only affects Ampere based GPUs, whether it affect the 3080 and 3090 is anyone’s guess, but from what I’ve seen reported, more 3080 and 3090s are affected.
This issue should be fix in a future driver update for users experiencing this issue.
**UPDATE:**
Seems like the latest version of the driver has more or less fixed the issue from the testing I’ve done in the past couple of days.
I will report back if problem comes back in the long-term.
I hate to be the bearer of bad news but...
Just bought a RTX 3090 EVGA FTW3 Ultra as well as borrowed a friend’s RTX 3080 MSI Suprim. The system used is a i9-10900K, 32GB DDR4 RAM, Z390 Aorus Pro
Both cards has stability problems, as in blacking out your screen for 10-15 seconds, and sometime you’ll get your desktop back with the message “NVIDIA Driver has recovered” message, and other times it will Blue Screen of Death on you with the error “VIDEO_TDR_FAILURE 0x00000116 (**0x116**)” and the system will reboot or simply soft reboot without showing a blue screen , on the silliest thing I’ve ever seen, browsing the internet, and watching videos on YouTube, Netflix and other 2D applications (including game trailers in Epic Game Store and Steam)...
It seems like the 3080 and 3090 GPUs drop both core clocks and memory clocks down to 0 MHz during 2D applications that is causing this, and a temporary fix for this is to disable hardware acceleration for the browser and certain other 2D video applications...
In 3D workloads (which including GPU Crypto Mining) where the GPU is using 3D clocks (anything over 800 MHz core), the GPUs are dead stable. 2D core clocks are usual around 210 MHz to 300 MHz range, while the memory hovers around the similar clocks below 500 MHz.
It seems like this is a quite common issue with the 3080 and 3090 cards, from numerous manufacturers. I tried the previous version NVIDIA Drivers 460.79, 460.89, as well as the newly release 461.09, and even flashed the latest BIOS on both graphics cards, and DDU the drivers with reinstalls, as well as Windows 10 fresh installs, and the issue still persists.
**According to the latest driver (version 461.09 release notes, NVIDIA addressed the 3x4K monitor resume from sleep BSoD triggering the same Video TDR Failure 0x116) Check the release note PDF here on page 12:
https://us.download.nvidia.com/Windows/ ... -notes.pdf
Microsoft’s explanation for VIDEO_TDR_FAILURE (0x116)
https://docs.microsoft.com/en-us/window ... dr-failure
Microsoft suggest 0x116 VIDEO_TDR_Failure is cause due to the graphics card not able to update the screen after a given time ** TDR Timer Delay in the registry**. Which makes a lot of sense as the black screening in 2D applications happens mostly when the GPU core clocks drops to 0 MHz instead of the 210 MHz - 300ish MHz normal 2D core clocks. And Microsoft suggest it may be a hardware issue, but that might not be the case with these graphics cards as NVIDIA themselves address the TDR Failure issue with the 461.09 driver update (well for resume from sleep that is). Based on current drivers, who knows how many other bugs are there, recent drivers has been a crap shoot since September.. Though there are reports that late September and early October NVIDIA drivers have less 0x116 black screens crashes, but concurrently have other issues CTD in some 3D application etc.
And yes, I did use 3 separate PCI-e power cables into the cards, and my power supply is a Seasonic Prime 1300W Gold (recently purchased less than 2 months ago), a pretty good power supply unit with low voltage ripple.
I’ve also tested with the same drivers and same OS install on the same computer system my trusty GTX 1080 Ti and RTX 2070 Super, and even my old GTX 970, none of them exhibit this behaviour. From a search on the internet, this TDR black screen that may recover or BSoD is quite common (check Reddit, EVGA Forums, and even NVIDIA’s own forums), and it seems to affect Founder’s Edition, ASUS, MSI, EVGA, Gainward, Gigabyte and other AIB vendors’ cards.
From what I’ve research, this could be a voltage load balancing issue when the GPU is at the P0 power state when idle, and some reported that changing PCI-e Power management and NVIDIA power preference to Maximize Performance can reduce the driver crashing/blacking out of the screen/BSoD, which also prevents the GPU from dropping its core clocks to 0 MHz. Though, I have tried this on both cards, and it does improve the stability by a huge huge margin, but at the expense of your GPU idle power around 100-130W and subsequently idle temps as well. The FTW3 RTX 3090 generally idles around 40-50W, and in my case/fan config it hovers around 37-40 degrees Celsius at room temperature air.
There is also the other “Black Screen of Death” that affect these 3080 and 3090 cards, which I personally have not experienced after days of 10+ hours of stress testing/benchmarking loops of 3DMark Time Spy Extreme and Port Royal. The Black Screen of Death is the one where the PCI-e Power connector LED will light up, and the computer will no longer recognize the graphic card, effectively “bricking” the video card (some users reported over on EVGA forums electricity arcs can be seen on the PCB followed by a “pop” sound that permanently break the cards, which make me to believe those with traditional (cylindrical) caps are more prone to this than the polymer (square/rectangular) caps used on some cards such as FE, TUF/STRIX etc.)
I still don’t know if this is a software related bug that is causing the GPU to drop clocks or is it a hardware power balancing or power delivery issue though. NVIDIA’s Ampere architecture in general is pretty bizarre, with crazy spikes of power with no absolute reason, same clocks, but random power spikes, and many people are reporting the same and were told is normal.
At this point, I’d say considering the prices of the RTX 30-series graphics card, I’d wait until NVIDIA and board partners iron out the bugs and issues... Don’t buy one even if you find a 3080 or whatever in stock. The stability and random BSoD just isn’t worth it, unless you have a backup machine to do your mission critical work from home things on, and have the machine with the RTX 30-series for high workload (3D) and such. With the current idle black screens with my result in BSoD, it is not worth your time and money to have this in you main machine at this time.
For now, I am stuck with the card for the time being as Canada Computers will not take returns on Graphics Card (possibly due the the high demand). Canada Computer told me that all GPUs are final sale, unless defective, and in which case, they will only exchange for the same graphics card, or the same series of graphics card of a different make.
If you do wish to pay these crazy prices for a 3080/3090 and become a free beta tester for NVIDIA, please buy it from a store such as Amazon that has a no question asked return policy, as you may experience constant stability issues at this time, based on a sample size of 2 ofc.
There may be other stability/crashing issues as well, I’m still in the process of testing. Hopefully these issues may be fixed in the near future, such as CTD in the early days.
If anybody has issues or if your card is working perfectly with their 30-series GPUs, please report them below, I would love to hear back from actual users.
It’s funny that none of the reviews I’ve read before I bought my 30-series GPU even mentioned these issues, maybe they got cherry picked or stability verified cards, idk . Not sure what the deal is there. But for my intended purposes of Architectural Visualization and VR Product Demos, this card is amazing, and a huge uplift in performance compared to my 2070 Super. But the stability is what really kills it for me at the moment. If I could return the card, I would. At this current time for the premium we are paying, I don’t think the 3080 and specifically the 3090 is even worth it.
Examples:
EVGA models:
https://forums.evga.com/RTX-30803090-Bl ... n-m3137072.
https://forums.evga.com/m/tm.aspx?m=3159640&p=1
Founders Edition:
https://www.nvidia.com/en-us/geforce/fo ... -rtx-3090/
https://www.reddit.com/r/techsupport/co ... n_desktop/
Gigabyte/ASUS (though these guys are seeing crashing in only certain 3D titles, probably a load balancing issue/driver):
https://www.nvidia.com/en-us/geforce/fo ... -game-aud/
https://www.nvidia.com/en-us/geforce/fo ... n-fan-100/
Basically the lists goes on..
It does seem to be 2 different types of black screen crashing that doesn’t brick cards. The ones that crashes in browser and/or desktop are the ones experiencing the 2D mode crashes. The ones crashing in certain video game titles seem to crash if the game is light on workload (GPU usage is low due to not demanding game and/or VSync is on, with a 3D clock of around 800 MHz to 1250 MHz). Games that fully load the GPU like Cyberpunk etc. and crypto mining that has a high GPU load doesn’t seem to have issues for the most part.
Of course, like in the OP, the black screen that lead to LED on the PCI-e power connector and a card isn’t recognized by the motherboard is a whole different issue all together, which usually can only be resolved with a new replacement card.
This is why I am suspecting it may have to do the PCB load balancing, and the driver still isn’t good to compensate for the variances in PCB design of different vendors and model makes.
These are just my speculations. So far, no manufacturer has come out and addressed the issue. NVIDIA on the other hand did have a few “Fixed Black Screen” for the 3080/3090 in their driver update change logs before, but the issue still persist.
According to people who have RMA’d their EVGA cards, it seems to be a very quick no questioned asked RMA for EVGA 3080/3090 XC3, FTW3 Ultra/Hybrid, KingPin cards. Not sure is this is just good service on EVGA’s part, or they might have an internal acknowledgement of the issue, still no public address on the matter from any manufacturers though.
Basically the TL:DR is that if your 30-series cards works fine as is in 2D and 3D mode, then I’m happy for you. Like I mentioned before, I don’t believe the 2D black screen crashing/soft reboots/BSoD are related to a hardware issue, but rather software (driver or GPU BIOS/NVFlash). (**unless you get the black screen with 100% fan on one of the card’s fans, and after reboot you get LEDs lit by PCIe power connectors, and motherboard/BIOS doesn’t recognize the GPU, then this is a HW issue and need a RMA replacement)**. Just make sure you don’t update the drivers if your current drivers that is stable for you. Updating the drivers can sometimes result in issues, and NVIDIA’s recent driver update track record is fixing an old bug, and introduce a couple new bugs along with it.
Last edited by Sorrosh on Jan 26th, 2021 6:14 pm, edited 13 times in total.