• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

What causes driver corruption?

Darkroman

New Member
Joined
May 8, 2022
Messages
2 (0.00/day)
After a while or just randomly driver issues might pop up that'll cause your computer to BSOD. What causes this? Is it just the constant overwriting of old drivers that at some point it just gets corrupted? Wouldn't/shouldn't an overwrite of the corrupted install fix that? Also why does using DDU to essentially nuke your graphics driver have a better success rate than just uninstalling the driver and reinstalling?
As a computer techie this is one of those questions I've been pondering for a while as I have used DDU (when appropriate) just based on symptoms found and short of a hardware failure, seems to fix graphics issues. I don't know WHY though as in the nitty gritty of it all. I can only speculate but I wanted a more in-depth explanation as to how this happens.
Definite causes I already know of are malware, computer shutting down during the install/update, and possibly bad sector on the drive that the driver files are written to before sector itself became corrupted.
 
windows updates, overclocking which affects the os.
 
multiple Blue screen's of death will corrupt a system
 
Poor memory or IMC tolerance to heat or frequency.
 
Improper shut down of computer is a common cause for file corruption - and drivers are files too. So always "gracefully" shut down Windows, then power off the computer. Or, just let the computer go to sleep.

Of course, a graceful shutdown is unavoidable if you suddenly lose power. If your grid is unstable, consider getting a good UPS with AVR. In fact, IMO, every computer should be on a "good" UPS with AVR.

That said, a faulty power supply can cause sudden shutdowns, reboots or lockups too - and those can corrupt files.

Any number of other faulty components that result in system shutdowns, reboots, or lockups can do this too.
 
If you are very, VERY unlucky a cosmic ray can cause a bit flip and corrupt a file, as it passes through your computer o_O
 
Last edited:
It is great that error correction is part of DDR5
 
After a while or just randomly driver issues might pop up that'll cause your computer to BSOD. What causes this? Is it just the constant overwriting of old drivers that at some point it just gets corrupted? Wouldn't/shouldn't an overwrite of the corrupted install fix that? Also why does using DDU to essentially nuke your graphics driver have a better success rate than just uninstalling the driver and reinstalling?
As a computer techie this is one of those questions I've been pondering for a while as I have used DDU (when appropriate) just based on symptoms found and short of a hardware failure, seems to fix graphics issues. I don't know WHY though as in the nitty gritty of it all. I can only speculate but I wanted a more in-depth explanation as to how this happens.
Definite causes I already know of are malware, computer shutting down during the install/update, and possibly bad sector on the drive that the driver files are written to before sector itself became corrupted.
Run sfc /scannow in the cmd.exe prompt with admin and see if you have corrupted system files. You can also have hardware stability issues of faulty storage. A hard disk with a bad/damaged read head is a great one of random corruption that can lead to BSOD. RAM overclocking is the fastest way to corrupt everything.
 
It was added to make poorer quality memory (and more profit), not to protect your data.
:( That, of course, is nonsense.

Error correction has been around for decades. It is there to protect the data/avoid data corruption (what other purpose would it serve?) in specific scenarios. It was commonly used in servers and "mission critical" systems.

If the goal was to make poorer quality RAM, you would not see the least expensive DDR5 on Newegg still come with a lifetime warranty.

If this is a function of anything, it is deeper densities and faster speeds allowing for even less room for error. In other words, a good thing.
 
:( That, of course, is nonsense.

Error correction has been around for decades. It is there to protect the data/avoid data corruption (what other purpose would it serve?) in specific scenarios. It was commonly used in servers and "mission critical" systems.

If the goal was to make poorer quality RAM, you would not see the least expensive DDR5 on Newegg still come with a lifetime warranty.

If this is a function of anything, it is deeper densities and faster speeds allowing for even less room for error. In other words, a good thing.

Sigh. On-die ECC in consumer DDR5 is not equivalent to ECC in servers and "mission critical" systems. The lifetime warranty is irrelevant to the selection of memory chips during manufacture.

"On-die error correction code (ECC)3 and error check and scrub (ECS), which were first to be adopted in DDR5, also allow for more reliable technology node scaling by correcting single bit errors internally. Therefore, it is expected to contribute to further cost reduction in the future."

 
Sigh. On-die ECC in consumer DDR5 is not equivalent to ECC
Double sigh.

I never said it was equivalent.

I said it was nonsense to suggest error correction was integrated into DDR5 so, as you contend, the makers could produce "poorer quality" RAM.

And of course, error correction is there to protect the data. I ask again - what other function would it have? Just because it does not function exactly the same way as error correction used in server RAM, that does not mean it is not there to correct errors and help prevent data corruption.

The lifetime warranty is irrelevant to the selection of memory chips during manufacture.
Of course it is. Perhaps not directly since the chip maker often is not the RAM stick maker. But ultimately, of course it is relevant. The stick makers are not going to buy chips from chip makers if their chips have a high failure rate. And they surely are not going to buy chips from those makers if they have to keep honoring free replacements to the end users.

Cost reduction does NOT imply "poorer quality".
 
Cost reduction does NOT imply "poorer quality".
This! So much this!

Stop with the conspiracy theories that lower cost = lower quality.
 
If you are very, VERY unlucky a cosmic ray can cause a bit flip and corrupt a file as it passes through your computer o_O
Oh yeah, aircraft computers have a surprising amount of bit flips due to cosmic rays due to less atmospheric protection at high altitude.

The designers try to protect against and mitigate these effects as much as possible.
 
Double sigh.

I never said it was equivalent.

I said it was nonsense to suggest error correction was integrated into DDR5 so, as you contend, the makers could produce "poorer quality" RAM.

And of course, error correction is there to protect the data. I ask again - what other function would it have? Just because it does not function exactly the same way as error correction used in server RAM, that does not mean it is not there to correct errors and help prevent data corruption.


Of course it is. Perhaps not directly since the chip maker often is not the RAM stick maker. But ultimately, of course it is relevant. The stick makers are not going to buy chips from chip makers if their chips have a high failure rate. And they surely are not going to buy chips from those makers if they have to keep honoring free replacements to the end users.

Cost reduction does NOT imply "poorer quality".

Memory that is so unreliable it needs on-die ECC is poor quality, you can say "it is progress man, we need it for density and moar speed", I can say "poor quality", whatever.

If they were really interested in protecting your data, they'd implement proper ECC support in the cpu, dimms and motherboard. But, they didn't.

Suggesting that the ECC in consumer DDR5 is there to protect our data, while perhaps technically accurate (in terms of what it does), is misleading as to the reason it was implemented and the consequence for consumers. I would not replace DDR4 with DDR5, with the assumption I'm now protected like a server (if my data were that important to me), by the presence of on-die ECC. This is what we're being led to believe (for marketing purposes) is the case.

What it actually means is (during manufacture and selection) more memory errors will be considered acceptable than they were before, but that doesn't sound so appealing to buy, does it?

Just cos it is broken doesn't mean it will fail prematurely. For example: all flash memory is broken (yeah, yeah, 'broken' is a relative term), some is just a little less broken and SSDs still come with decent warranties regardless of their status out of the box.

"Cost reduction does NOT imply "poor quality". - Sure, sure, keep the faith.
 
Improper shut down of computer is a common cause for file corruption - and drivers are files too.
Actually I think a lot of filesystems will not overwrite the existing file, but will write the data to a free block(s) and then write it to the journal, leaving the space for the "old file" open for reclamation. From that perspective, computers are actually pretty resilient to unexpected shutdowns because an incomplete write is just treated as free space after reboot and the existing file is still there.

Memory that is so unreliable it needs on-die ECC is poor quality, you can say "it is progress man, we need it for density and moar speed", I can say "poor quality", whatever.

If they were really interested in protecting your data, they'd implement proper ECC support in the cpu, dimms and motherboard. But, they didn't.

Suggesting that the ECC in consumer DDR5 is there to protect our data, while perhaps technically accurate (in terms of what it does), is misleading as to the reason it was implemented and the consequence for consumers. I would not replace DDR4 with DDR5, with the assumption I'm now protected like a server (if my data were that important to me), by the presence of on-die ECC. This is what we're being led to believe (for marketing purposes) is the case.

What it actually means is (during manufacture and selection) more memory errors will be considered acceptable than they were before, but that doesn't sound so appealing to buy, does it?

Just cos it is broken doesn't mean it will fail prematurely. For example: all flash memory is broken (yeah, yeah, 'broken' is a relative term), some is just a little less broken and SSDs still come with decent warranties regardless of their status out of the box.

"Cost reduction does NOT imply "poor quality". - Sure, sure, keep the faith.
If non-ECC memory is so bad, maybe you could enlighten us as to how often bits flip on machine running standard JEDEC speeds and not with an overclocked (sp. out of spec,) CPU. I suspect it's not very often and you're probably not doing something "mission critical" in the truest sense of the term.

The reality is that overclocking and running mismatched DIMMs is the biggest source of data corruption, at least from my perspective. Fight me. :P
 
Memory that is so unreliable it needs on-die ECC is poor quality, you can say "it is progress man, we need it for density and moar speed", I can say "poor quality", whatever.

If they were really interested in protecting your data, they'd implement proper ECC support in the cpu, dimms and motherboard. But, they didn't.

Suggesting that the ECC in consumer DDR5 is there to protect our data, while perhaps technically accurate (in terms of what it does), is misleading as to the reason it was implemented and the consequence for consumers. I would not replace DDR4 with DDR5, with the assumption I'm now protected like a server (if my data were that important to me), by the presence of on-die ECC. This is what we're being led to believe (for marketing purposes) is the case.

What it actually means is (during manufacture and selection) more memory errors will be considered acceptable than they were before, but that doesn't sound so appealing to buy, does it?

Just cos it is broken doesn't mean it will fail prematurely. For example: all flash memory is broken (yeah, yeah, 'broken' is a relative term), some is just a little less broken and SSDs still come with decent warranties regardless of their status out of the box.

"Cost reduction does NOT imply "poor quality". - Sure, sure, keep the faith.
Wow, so cynical... I'm sure you work for a DRAM manufacturer or JEDEC and know all the reasons why ECC was added internally for DDR5. AFAIK, with larger and larger amounts RAM being installed in PCs, the chances of a flipped bit in data at rest has increased substantially. This new feature is meant to prevent that. Yeah, it isn't full-blown end to end ECC like we see in workstations and servers but do most people really need that anyway? Most people who have bad RAM don't know it in the first place, they think the software is junk or that it must be a virus.
 
Last edited:
Wow, so cynical... I'm sure you work for a DRAM manufacturer or JEDEC and know all the reasons why ECC was added internally for DDR5. AFAIK, with larger and larger amounts RAM being installed in PCs, the chances of a flipped bit in data at rest has increased substantially. This new feature is meant to prevent that. Yeah, it isn't full-blown end to end ECC like we see in workstations and servers but do most people really need that anyway? Most people who have bad RAM don't know it in the first place, they think the software is junk or that it must be a virus.

You don't have to believe me man, just read the non-public facing documentation from people like Hynix or Micron, or look at the more informed/technical articles on the net. I'm not the only one who has addressed the misconception about the nature of DDR5's ECC *. Is my interpretation of the facts cynical AF? Probably, but like I said, this stuff is in the public domain. If you ask someone from Micron directly, obviously they're not going to describe it to you the way I just did. They'll tell you about the shrinking nodes, the density and the speed, just like Bill did in his reply.

If non-ECC memory is so bad, maybe you could enlighten us as to how often bits flip on machine running standard JEDEC speeds and not with an overclocked (sp. out of spec,) CPU. I suspect it's not very often and you're probably not doing something "mission critical" in the truest sense of the term.

The reality is that overclocking and running mismatched DIMMs is the biggest source of data corruption, at least from my perspective. Fight me. :p

It wasn't my point to say if non-ecc memory is bad, or not. I was referring to the memory chips themselves and the differences between ddr4/ddr5 and how they're made. Believe it or not, I actually have a fairly high trust in the non-ecc DDR4 I own. I can only answer your question privately and I've never had a corrected error recorded in the logs on this system (run for several years 24/7), or the other ECC system I run that was built a few years prior.

* The salient part is after 6:11 (this video is by Ian Cutress, senior editor of Anandtech for 11 years, if you don't believe him either, idk what to tell you).


For those who dislike watching videos:

What on-die ecc support means think of it this way, on-die ecc allows memory manufacturers to go denser on the process to get higher density memory and more of it comes out the factory and lowers the cost, what it does not do is protect your data, it enables more scaling down to denser process nodes.

So the takeaway here is that on die ecc is not an ecc for you or me it is simply a device that helps make the memory cheaper and better yielding. If you do need a proper end-to-end ecc solution where your data is fully protected from cpu to memory or memory to accelerator then you still need to invest in an ecc based platform, it's going to be fun to see how these memory manufacturers deal with customers who say you said i had ecc when in actual fact you really don't.

on die ecc is a way of managing those bit flips so more cells at the production stage pass the validation method, it's simply to say well normally some cells won't work because you get defects in the manufacturing process, but with this on die ecc you can actually make sure that more of those cells reach the required jedec specification and you can sell that memory
 
Last edited:
Yeah, it isn't full-blown end to end ECC like we see in workstations and servers but do most people really need that anyway? Most people who have bad RAM don't know it in the first place, they think the software is junk or that it must be a virus.
My PC (see specs) originally had 16GB DDR3 as a 4x4 arrangement. For years it was intermittently unstable, ie BSOD and other niggly errors. Really intermittent, so much so that troubleshooting just couldn't pinpoint it, including removing the memory modules. It often looked like software was doing it, but I didn't think so.

A couple of years ago, the instability started to get noticeably worse where it would BSOD every few minutes which allowed me to finally track it down to one of the memory modules with 100% certainty. I did notice that with 4 modules installed, they were all running hot, with the one closest to the CPU very hot, even though the PC isn't overclocked. I suspect that this heat was enough to make the module unstable and eventually to damage it, or maybe it was slightly faulty from new, I'll never know. I don't know if that faulty module was actually in that slot though.

I then replaced all the modules with new 2x8 Corsair modules of the same type and model range and put them in slots 1 & 3 so that the closest slot isn't occupied and the PC has run fine ever since - a blessed relief after all that time not running quite right. The two modules run a lot cooler, too. I've actually got another 2x8 modules used bought from a friend as part of a spare mobo / CPU / RAM / cooler bundle. I tried them and while it was great seeing 32GB in this old Sandy Bridge rig, the modules ran hot again, so I've removed them. I don't need 32GB and it's just not worth the headache. I'd have to add an extra fan there to fix the heat problem and that's not good for a PC built to be very quiet.
 
Strange, my DDR3 sticks are not even warm to the touch.
 
Strange, my DDR3 sticks are not even warm to the touch.
I think it's the ventilation around that area is poor.
 
Back
Top