# World Community Grid/F@H: Crunching for the Cure... or are we?



## hat (May 21, 2010)

A lot of us here run either World Community Grid, F@H, or both, myself included, which is a great thing, and I admire our efforts. Similarly, a lot of us overclock our rigs. Processors, graphics cards, memory, all... presonally, I find myself tweaking whatever can be tweaked to get every ounce of performance out of it... not because I have to, but because I can, and because the faster my components are, the more work I do.

Now, this too is a great thing, but I have seen the topic of 'old school' and 'new school' overclocking argued countless times. Myself, coming from the old school, am sure to say that the new school method is wrong, because the new school meathod seems to be strictly trial and error—set something and roll with it. If something errors, change it. Now, that's fine if that's how you roll, but consider this: if you run your system this way, not knowing whether it's truly stable or not, how can you be sure you're not _sending in bad results_ to the WCG/F@H servers? Sure, they send the same work unit out and compare the results for differences, but there's two problems with this. The first problem is it's possible for something to slip through, as with any system. The second problem is that if you're sending in work units that are getting thrown out in the end, you would be doing much more work running at stock than overclocked.

Anyways, my point being made, I encourage each and every one of you, if you havn't already, to thoroughly test your overclocks. Run LinX overnight, and if it errors, do something to correct it—back down the clocks, change voltages, whatever. Same with your GPU... run the OCCT GPU test for a while, like set it to run before you go take a shower, and check it when you get back. I usually take about 20 minutes once everything's said and done, this should be enough time to expose any errors. If it errors, back down your clocks.

Links to some stability tests:
tests provided by Stanford: http://folding.stanford.edu/English/DownloadUtils
OCCT: http://www.ocbase.com/perestroika_en/index.php?Download
LinX: http://www.xtremesystems.org/forums/showthread.php?t=201670

monitor temps with realtemp and gpu-z
http://www.techpowerup.com/realtemp/
http://www.techpowerup.com/gpuz/


----------



## dark2099 (May 21, 2010)

Might want to change the title, I thought your thread was going to be about if Folding & crunching are worth while.


----------



## hat (May 21, 2010)

That's kind of my point. It's _not_ worthwhile if the work is being done by not-so-stable rigs.


----------



## mjkmike (May 21, 2010)

I think you bring a very good point to the surface.
I stress and stress every clock and at times even the ones that pass 24 test still fail on a few WCG units or my rig overheats. Some like to avoid work units that give errors but to me that just tells me my rig just isn't 100%.
  I am doing this now. The 1055t and the I7930 haven't done a full 24h test the lower clocks have but the new ones no not at this time.


----------



## [Ion] (May 21, 2010)

My rule for OCs is that it must pass 8 hours of LinX, 8 hours each of Prime 95 Small FFTs, Large FFTs, and blend.  If it passes this, it's fully ready to go for F@H/WCG


----------



## mjkmike (May 21, 2010)

Just got home and the I7 failed Prime 95 blend. clocking her down to 4.0
Thanks for the reminder Hat. Also I realy need to find a new home away from the main rig that vents strait at her.


----------



## FordGT90Concept (May 21, 2010)

I've always firmly believed overclocking and science don't mix.  Overclocking increases the likelihood of errors and good science doesn't tolerate errors.  The 25% performance gain by overclocking is easily lost on wasting 4 hours of CPU time with a completed task being invalidated.


----------



## mjkmike (May 21, 2010)

There has to be a fine line, just how much is too much. This is T POWERUP we will push.


----------



## Bo$$ (May 21, 2010)

how do they know if a work unit is invalid?


----------



## hat (May 21, 2010)

Bo$$ said:


> how do they know if a work unit is invalid?



They send the same work unit out to a bunch of machines and compare results.


----------



## [Ion] (May 21, 2010)

hat said:


> They send the same work unit out to a bunch of machines and compare results.



2 or 3, typically, although I think it's more for HCMD2 and was more for RICE as well


----------



## thebluebumblebee (May 21, 2010)

hat, how about some links in the above?  Also, F@H has 2 tests available to test your system. A memory test for Nvidia cards and a CPU stress test, if anyone's looking for another way to test. Link


----------



## hat (May 21, 2010)

When memtestg80 asks for your memory clock... do you put the in the speed before or after DDR?


----------



## thebluebumblebee (May 21, 2010)

hat said:


> When memtestg80 asks for your memory clock... do you put the in the speed before or after DDR?



Just a guess, but I'd say before. http://fah-web.stanford.edu/MemtestG80/memtestG80-readme.txt


----------



## hat (May 21, 2010)

gah, now I can't figure out how to make it use more than 128mb vram...

occt has a similar test that works fine, though


----------



## thebluebumblebee (May 21, 2010)

For example, to run MemtestG80 over 256 megabytes of RAM, with 100 test
iterations, execute the following command:

    MemtestG80 256 100


----------



## hat (May 22, 2010)

If I do that, the program just crashes when it starts the test...


----------



## Chicken Patty (May 24, 2010)

I usually test my overclocks with WCG.  Never had any rig return errors, if it crashed I adjust it and that's it.  You spend 8 hours stress testing or more you are not returning any results.  You had a rig crash one or twice in 8 hours, at least you got something done.  But of course that's my two cents, In no way do I want to start a arguement, I truly respect hats opinion.  However, I would just consider myself new school I guess.


----------



## garyinhere (Jun 9, 2010)

I'm the same way... no need to prime for 24 hrs if it is crunching without errors or crashing. I just watch temps!


----------



## theonedub (Jun 9, 2010)

I have never Prime'd for 24hrs, however I will not run WCG without some sort of stability testing done first.

All I do is OCCT the CPU for 90-120 *minutes* and if it passes I let it rip on WCG 24/7 and watch my results. I have yet to return an invalid WU due to OC


----------



## hat (Jun 9, 2010)

Chicken Patty said:


> I usually test my overclocks with WCG.  Never had any rig return errors, if it crashed I adjust it and that's it.  You spend 8 hours stress testing or more you are not returning any results.  You had a rig crash one or twice in 8 hours, at least you got something done.  But of course that's my two cents, In no way do I want to start a arguement, I truly respect hats opinion.  However, I would just consider myself new school I guess.





garyinhere said:


> I'm the same way... no need to prime for 24 hrs if it is crunching without errors or crashing. I just watch temps!



You are doing it wrong. I very strongly argue against this. I don't care what you do with your own data, but if you're handling work units on a possibly unstable computer, you're possibly playing with lives because you were too lazy to run a good test for a while. One day max of time spent testing and not computing work units is nothing compared to a lifetime of corrupt results generated from a bad overclock.


----------



## garyinhere (Jun 9, 2010)

hat said:


> You are doing it wrong. I very strongly argue against this. I don't care what you do with your own data, but if you're handling work units on a possibly unstable computer, you're possibly playing with lives because you were too lazy to run a good test for a while. One day max of time spent testing and not computing work units is nothing compared to a lifetime of corrupt results generated from a bad overclock.



Did you really jst call me lazy... do you know the amount of hours i put into oc'ing my computer? Do you know the hours of research i have done. I respect your opinion but it's your and no need to call me or anyone else names... especially when you don't know what your talking about!


----------



## Athlon2K15 (Jun 9, 2010)

your all folding and crunching to help north korea build nuclear weapons you just dont know it yet


----------



## rick22 (Jun 9, 2010)

Rock on teabag...new info everyday...i love my life..


----------



## erocker (Jun 9, 2010)

hat said:


> You are doing it wrong. I very strongly argue against this. I don't care what you do with your own data, but if you're handling work units on a possibly unstable computer, you're possibly playing with lives because you were too lazy to run a good test for a while. One day max of time spent testing and not computing work units is nothing compared to a lifetime of corrupt results generated from a bad overclock.



The fact that you are taking someone's good intentions and equating it to "Playing with people's lives" is shameful.  You have no right to call someone else lazy, you are not them. Lighten up.



AthlonX2 said:


> your all folding and crunching to help north korea build nuclear weapons you just dont know it yet



Not really appropriate. :shadedshu


----------



## garyinhere (Jun 9, 2010)

AthlonX2 said:


> your all folding and crunching to help north korea build nuclear weapons you just dont know it yet



Do you have something to back this claim up?


----------



## Athlon2K15 (Jun 9, 2010)

how else do you think they figured it out?


----------



## garyinhere (Jun 9, 2010)

AthlonX2 said:


> how else do you think they figured it out?



If you have something to say on the topic of the thread we'd love to hear it... otherwise...


----------



## theonedub (Jun 9, 2010)

erocker said:


> The fact that you are taking someone's good intentions and equating it to "Playing with people's lives" is shameful.  You have no right to call someone else lazy, you are not them. Lighten up.



Geez take it easy E. All he is saying is that this is important work that, in the end, we hope will save lives. There is no point in wasting CPU time or sending in bad results if they can be prevented by stability testing. 

Yes its true that participating in Distributed Computing is a volunteer act that people do with good intentions, and that's great. However, just because your intentions are good or because you are donating the resources doesn't mean you should cut corners or not hold yourself to a minimum/required standard. Is it okay to donate broken toys to a toy drive, not wash your hands when volunteering at a soup kitchen, or to donate just the muffin tops? No, and I think sending WUs on an untested OC is a near equivalent.

I will agree that its unfair to call someone lazy for not stability testing though, more often then not getting WCG and F@H running right is a lot more work intensive then opening OCCT or LinX and pressing go while you watch TV, go out, or sleep  

In the end though, if you want to ensure you are going to be giving *worthwhile* results to the project, get your stability test in


----------



## erocker (Jun 9, 2010)

I guess you missed my point, it doesn't matter. Regardless, continue doing what you are doing for the good of others.


----------



## Chicken Patty (Jun 9, 2010)

hat said:


> You are doing it wrong. I very strongly argue against this. I don't care what you do with your own data, but if you're handling work units on a possibly unstable computer, you're possibly playing with lives because you were too lazy to run a good test for a while. One day max of time spent testing and not computing work units is nothing compared to a lifetime of corrupt results generated from a bad overclock.



Once again, everybody is entitled to their own opinions.  No need to take it so seriously.  That's what "error" units are for, they don't count, you don't get credited for them.  I have never ever primed either of these rigs or stress tested.  My i7 was crunching at 4.3 Ghz earlier for about 1.5 hours in the afternoon from a bench session, I lowered it back to 3.8 Ghz due to temps, was just too lazy to restart earlier.  Anyhow, your opinion is your opinion.  Here's proof that new school or ol school you can crunch without stability testing.

Zero Errors!  My i7 has a new windows install so that's why there is two devices for it.


----------



## hat (Jun 9, 2010)

Sorry if you think I'm coming across as harsh, and this post will probably come across as harsh as well, but if you are going to read this, I plead with you to read all the way until the end—don't stop midpoint in my post carrying the thought of "wow, hat's really an asshole" with you. I truly believe that to run a project like Folding@Home or WCG on an overclocked computer without sufficient stability testing is asinine. Sure, it may say zero errors, but what if there is an error somewhere in one of those work units, and it happens to slip through the cracks? Think of a complex math equation with many steps. What happens if you slip up and change a sign somewhere, slip up on your arithmatic? Sure, all the other stuff might be right, but there was a "stability error", per se, and the whole effort is wasted when you get the wrong answer. What if that is what we are doing... what if our possibly unstable computers are making a miscalculation somewhere, and it slips through the cracks?

As I said before, we could be holding future lives in our hands. When you overclock and get subtle stability errors over time because you never tested and you find out your computer is behaving abnormally, possibly not able to boot to windows because a critical system file got corrupted, there is no real harm done. Sure, it sucks reinstalling windows and all those programs and getting everything set up the way you had it, but at the end of the day when it's all said and done, it's no big deal; however, when an overclocked effort to cure cancer or another disease goes awry like an installation of windows that was slowly knocked off its feet by a slightly unstable system overtime, the effects could be disaterous.

I am not equating anyone's effort to "playing with lives", or at least, I am not trying to, even though it may seem that way. We all run distributed computing projects to help others. Many of us have spent our money to upgrade computers that run these projects to get more work done, and similarly, we overclock knowing that the higher speed will get more work done, and that's great. One of the main reasons I overclock is to get more work done. I'm just saying that if proper tests aren't done to verify the stability of our computers doing this magnificent work, it could be all for naught—or even having adverse effects.

Again, please don't take me the wrong way. I've been here since 2006; if I were a troll, asshole or otherwise, I'm sure someone would have noticed by now. I intend to do no harm to my readers, emotional or otherwise. I think we're a great bunch of people and we have a very tight-knit community for being a tech forum as large as we are. I am friends with many of you, and some of you have helped me with many things. I recall getting a 17" LCD monitor off one of you for free, and I don't think you even asked me to pay shipping (if you're reading this, I havn't forgotten your name, I remember exactly who you are, but I remember you not wanting me to give your name out by publically thanking you). I just believe very strongly that *everyone* should test thier overclocks, *if they are running a distributing project*, such as F@H and WCG, the two projects many of us have become so fond of. If you are still reading at this point, and you are one of the ones who are running F@H or WCG without having properly stability tested your computer, I strongly encourage you to do so.


----------



## Chicken Patty (Jun 9, 2010)

hat said:


> Sorry if you think I'm coming across as harsh, and this post will probably come across as harsh as well, but if you are going to read this, I plead with you to read all the way until the end—don't stop midpoint in my post carrying the thought of "wow, hat's really an asshole" with you. I truly believe that to run a project like Folding@Home or WCG on an overclocked computer without sufficient stability testing is asinine. Sure, it may say zero errors, but what if there is an error somewhere in one of those work units, and it happens to slip through the cracks? Think of a complex math equation with many steps. What happens if you slip up and change a sign somewhere, slip up on your arithmatic? Sure, all the other stuff might be right, but there was a "stability error", per se, and the whole effort is wasted when you get the wrong answer. What if that is what we are doing... what if our possibly unstable computers are making a miscalculation somewhere, and it slips through the cracks?
> 
> As I said before, we could be holding future lives in our hands. When you overclock and get subtle stability errors over time because you never tested and you find out your computer is behaving abnormally, possibly not able to boot to windows because a critical system file got corrupted, there is no real harm done. Sure, it sucks reinstalling windows and all those programs and getting everything set up the way you had it, but at the end of the day when it's all said and done, it's no big deal; however, when an overclocked effort to cure cancer or another disease goes awry like an installation of windows that was slowly knocked off its feet by a slightly unstable system overtime, the effects could be disaterous.
> 
> ...



This post was much better than your last couple of posts bro, not harsh at all.  That you encourage us to do so is totally fine.  However, just because you passed OCT or Prime doesn't mean your computer is stable, you might pass 8 hours, but let it go 8.5 hours and crash.  Might take longer, but the errors will still arise and it does not warrant anything.  Even at stock clocks, a error can happen for no apparent reason and squeeze through the cracks.  It's just something you can't control.  I really appreciate your efforts and making this thread in the first place and your efforts toward helping the team and anybody who runs a distributed computing project as a whole, but you can't come in here expecting to change everyone's opinion which is what it seemed like a few posts back.  Don't think there is nothing else to really say much as we both voiced our opinions and discussed them much already.  Just don't want this back and forth to continue as like I said it's your opinion and then mine, they won't change.  Hopefully somebody else chimes with some feedback of their own.


----------



## hat (Jun 9, 2010)

Yes, even stock cpus can be unstable, or overclocks that pass say 24 hours of LinX, the the likelihood of that happening when stacked up againt a "set it and forget it" untested OC are slim to nil.

That aspect of my argument having been argued, I agree, I think we've both made our points, and there's not much left to discuss.


----------



## garyinhere (Jun 9, 2010)

Plus these are doctors and graduate students we are dealing with... They have to factor in an error percentage anyways! Its just good science to do so!!!


----------

