• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Guide to running multiple GPU work units

Well, I had my first issue where the GPU WUs just get "stuck" and don't do anything. I lost about an hour while I had two just sit there and spin...I aborted them. Any idea how to avoid this in the future?

Not really any way to avoid it. :o If the server spams your computer with WU that aren't in the app_info.. all those discarded "no reply" will eventually stop WU going to your machine. Same for version changes in any app/project. And then there's all the inexplicable ones.. :confused:

It's the price we pay for using this workaround method at the moment.
 
Not really any way to avoid it. :o If the server spams your computer with WU that aren't in the app_info.. all those discarded "no reply" will eventually stop WU going to your machine. Same for version changes in any app/project. And then there's all the inexplicable ones.. :confused:

It's the price we pay for using this workaround method at the moment.

Fair enough. At least it's the first time in about two days--I'll just make sure to keep a close eye on it :)
 
I just ran out of work, getting errors in the Event log, off to find out why :banghead:
 
Looks like one of the WCG techs is already working on it. They're pretty awesome like that.
https://secure.worldcommunitygrid.org/forums/wcg/viewpostinthread?post=401222

That wasn't the exact issue I had---at least it doesn't look like it.

It's a shame too--with how long they had been running, each one would have pulled down nearly 400 pts! :eek:


EDIT: Nevermind, I just checked my WUs and I have nearly 15 pages of No Replies. I'll definitely be keeping an eye on this one.
 
Anyone else getting a mix of the 7.05 and 6.56 units HCC?

I deleted the file, restarted BOINC, and saw both. Then I closed it down, put file back, and it seemed to work again. But I think i'm still getting the errors, so I'll likely run out again. I'm heading out for the night, so it'll be whatever it is when I get back :mad: :mad:
 
Currently I'm getting only 7.05's (keeps fingers crossed)
 
Currently I'm getting only 7.05's (keeps fingers crossed)

You the lucky one ;), I hope it keeps working for you! I was on my PC with everything fine, then boom pure nonsense. I'm out now and will check it when I get home. Eh
 
I'm having no issues--nothing but 7.05s right now and my system is still pulling down a bunch of new WUs. Although the timing is worse, and there are longer idle periods on the card than there had been....ohwell.
 
Good news, it all was sorted out when I got home just now: the Wus are cranking out like they should :) I didn't think it would be working since I had seen the bad errors in EV right before leaving the house, but it's all good.:toast: I'm not sure what started the whole debacle, but I'm glad it's done and hope it says put! :rockout:
 
Well.. if you want to do some reading.. this thread covers everything: http://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,34252_offset,40

It has been resolved as per the WCG tech Kevin Reed:
So here is the problem:

17/11/2012 02:58:23 | World Community Grid | [error] App version returned from anonymous platform project; ignoring
17/11/2012 02:58:23 | World Community Grid | [error] No app version found for app hcc1 platform windows_intelx86 ver 656 class ati_hcc1; discarding X0960073631347200608011011_4


A job gets assigned that uses the old app version.
The host is anonymous platform, so it ignores the app version sent.
The host can't find app version that matches the platform, version num and plan_class so it discards the job

The problem is that discarding the job does not report to the server that the client isn't running it. Thus the job is still assigned to you.

Next request to the server, you get sent the job again. This continues.

Even worse, each time the job is sent to you the deadline for the job is re-evaluated and possibly slightly increased. Thus it can potentially never pass its deadline.

When I started digging into this problem today, there were a lot of computers who were repeatedly being resent the same jobs.

This issue occurs when all three of app_info.xml is used, homogenous_app_version and resend_results.


Resolving this issue for the long run is going to be somewhat tricky. As a result, what I am doing now is changing the app_version on the workunits to all be at the 705 level. The new binaries are backward compatible so this shouldn't be an issue. This should return life to normal for now.

For now.. all HCC WU are sent with version 705 to avoid the problem found with app_info users.
 
Everything still going great on my cards--no more "hanging" WUs (although some do seem to be going a bit slow--something I need to keep an eye on).
 
OK, so if I want to run two GPU WU's on an Nvidia card with a dual-thread CPU, can I do the same app_info.xml file with .5 count on the GPU?
 
I'm now receiving "Project is temporarily shut down for maintainence," started reporting that around 4am (didn't see it til now). No more work is coming in, and I have like 153 completed tasks that are ready to report. Little frustrating. :banghead:
 
I'm now receiving "Project is temporarily shut down for maintainence," started reporting that around 4am (didn't see it til now). No more work is coming in, and I have like 153 completed tasks that are ready to report. Little frustrating.

Same here. Project is temporarily shut down for maintainence at 7:40 GMT.
It is very strange that there is no notification on the WCG forum.
 
Same here. Project is temporarily shut down for maintainence at 7:40 GMT.
It is very strange that there is no notification on the WCG forum.

Thanks for confirming, glad to know it's not just me. I'll check back later. Hopefully it'll kick back in, report what's done, and resume work :toast:
Edit: Project just kicked back in, things are ramming up again :D
 
Last edited:
OK, so if I want to run two GPU WU's on an Nvidia card with a dual-thread CPU, can I do the same app_info.xml file with .5 count on the GPU?

Wow, at this rate you'll do in a few months what took me 3 years! :laugh:

But yes, it works no different. Just need to get the file with the NVIDIA/CUDA code instead and you're ready to go. :toast:
 
I'm now receiving "Project is temporarily shut down for maintainence," started reporting that around 4am (didn't see it til now). No more work is coming in, and I have like 153 completed tasks that are ready to report. Little frustrating. :banghead:

Ahhh, thanks for posting this! I got up this morning and I was distressed to see my output was so low and that the avg clock on the HD7770 had dropped from 960MHz to ~810.

But it's working again now :)
 
Got a great score on a HIS HD7870 the other day. Problem is, it will not be here until Wednesday. :-(

At least I sure hope it arrives then. LOL

Can't wait to see what kind of numbers that 7870 puts up.
 
Got a great score on a HIS HD7870 the other day. Problem is, it will not be here until Wednesday. :-(

At least I sure hope it arrives then. LOL

Can't wait to see what kind of numbers that 7870 puts up.

I'm running mine with single GPU wu's (no app_info file) and getting over 15k ppd from just the gpu wu's... :toast:

Running 3-4 wu's at once with the app_info tweak and you should be in the 50-60k+ ppd range with the rig :pimp:
 
Wow, at this rate you'll do in a few months what took me 3 years! :laugh:

But yes, it works no different. Just need to get the file with the NVIDIA/CUDA code instead and you're ready to go. :toast:

OK, so I tried to mess around with the app_info.xml file for my P4 and GTX280. This is what I tried to run:

Code:
<app_info>
 <app>
 <name>hcc1</name>
<user_friendly_name>Help Conquer Cancer</user_friendly_name>
 </app>
 <file_info>
 <name>wcg_hcc1_img_7.05_windows_intelx86__nvidia_hcc1</name>
 <executable/>
 </file_info>
 <file_info>
 <name>hcckernel.cl.7.05</name>
 <executable/>
 </file_info>
 <app_version>
 <app_name>hcc1</app_name>
 <version_num>705</version_num>
 <platform>windows_intelx86</platform>
<plan_class>nvidia_hcc1</plan_class>
 <avg_ncpus>1.0</avg_ncpus>
 <max_ncpus>1.0</max_ncpus>
 <coproc>
 <type>[COLOR="Red"]nvidia[/COLOR]</type>
 <count>.5</count>
 </coproc>
 <file_ref>
 <file_name>wcg_hcc1_img_7.05_windows_intelx86__nvidia_hcc1</file_name>
 <main_program/>
 </file_ref>
 <file_ref>
 <file_name>hcckernel.cl.7.05</file_name>
 <open_name>hcckernel.cl</open_name>
 </file_ref>
 </app_version>
</app_info>

All I got were "GPU missing" errors in the event log. :( Any thoughts on what I might be missing?

EDIT: Nevermind - I looked at the first post again and saw that the co-processor type should be "Cuda" and not "nvidia." I'm going to give it another go now...
 
OK, so I tried to mess around with the app_info.xml file for my P4 and GTX280. This is what I tried to run:

Code:
<app_info>
 <app>
 <name>hcc1</name>
<user_friendly_name>Help Conquer Cancer</user_friendly_name>
 </app>
 <file_info>
 <name>wcg_hcc1_img_7.05_windows_intelx86__nvidia_hcc1</name>
 <executable/>
 </file_info>
 <file_info>
 <name>hcckernel.cl.7.05</name>
 <executable/>
 </file_info>
 <app_version>
 <app_name>hcc1</app_name>
 <version_num>705</version_num>
 <platform>windows_intelx86</platform>
<plan_class>nvidia_hcc1</plan_class>
 <avg_ncpus>1.0</avg_ncpus>
 <max_ncpus>1.0</max_ncpus>
 <coproc>
 <type>[COLOR="Red"]nvidia[/COLOR]</type>
 <count>.5</count>
 </coproc>
 <file_ref>
 <file_name>wcg_hcc1_img_7.05_windows_intelx86__nvidia_hcc1</file_name>
 <main_program/>
 </file_ref>
 <file_ref>
 <file_name>hcckernel.cl.7.05</file_name>
 <open_name>hcckernel.cl</open_name>
 </file_ref>
 </app_version>
</app_info>

All I got were "GPU missing" errors in the event log. :( Any thoughts on what I might be missing?

EDIT: Nevermind - I looked at the first post again and saw that the co-processor type should be "Cuda" and not "nvidia." I'm going to give it another go now...

Well spotted. Hopefully that sort that out.
 
Yep, it sure did. I was running one CPU WU at 6 hours and one GPU wu at 20 minutes. Now I'm running two GPU WU's that take about 28 minutes each, so kicking out one every 14 minutes on that rig. Not bad, but I have some plans for that rig tomorrow. ;)
 
Yep, it sure did. I was running one CPU WU at 6 hours and one GPU wu at 20 minutes. Now I'm running two GPU WU's that take about 28 minutes each, so kicking out one every 14 minutes on that rig. Not bad, but I have some plans for that rig tomorrow. ;)

Come on t, give the rest of us a chance :roll:
Great work man! :toast:
 
Back
Top