Guide to running multiple GPU work units

KieX · Nov 16, 2012

[Ion] said:
Well, I had my first issue where the GPU WUs just get "stuck" and don't do anything. I lost about an hour while I had two just sit there and spin...I aborted them. Any idea how to avoid this in the future?

Not really any way to avoid it.

If the server spams your computer with WU that aren't in the app_info.. all those discarded "no reply" will eventually stop WU going to your machine. Same for version changes in any app/project. And then there's all the inexplicable ones.. :confused:

It's the price we pay for using this workaround method at the moment.

[Ion] · Nov 16, 2012

KieX said:
Not really any way to avoid it. If the server spams your computer with WU that aren't in the app_info.. all those discarded "no reply" will eventually stop WU going to your machine. Same for version changes in any app/project. And then there's all the inexplicable ones..

It's the price we pay for using this workaround method at the moment.

Fair enough. At least it's the first time in about two days--I'll just make sure to keep a close eye on it

KieX · Nov 17, 2012

[Ion] said:
Fair enough. At least it's the first time in about two days--I'll just make sure to keep a close eye on it

Looks like one of the WCG techs is already working on it. They're pretty awesome like that.
https://secure.worldcommunitygrid.org/forums/wcg/viewpostinthread?post=401222

manofthem · Nov 17, 2012

I just ran out of work, getting errors in the Event log, off to find out why :banghead:

[Ion] · Nov 17, 2012

KieX said:
Looks like one of the WCG techs is already working on it. They're pretty awesome like that.
https://secure.worldcommunitygrid.org/forums/wcg/viewpostinthread?post=401222

That wasn't the exact issue I had---at least it doesn't look like it.

It's a shame too--with how long they had been running, each one would have pulled down nearly 400 pts! :eek:

EDIT: Nevermind, I just checked my WUs and I have nearly 15 pages of No Replies. I'll definitely be keeping an eye on this one.

manofthem · Nov 17, 2012

Anyone else getting a mix of the 7.05 and 6.56 units HCC?

I deleted the file, restarted BOINC, and saw both. Then I closed it down, put file back, and it seemed to work again. But I think i'm still getting the errors, so I'll likely run out again. I'm heading out for the night, so it'll be whatever it is when I get back :mad:

t_ski · Nov 17, 2012

Currently I'm getting only 7.05's (keeps fingers crossed)

manofthem · Nov 17, 2012

t_ski said:
Currently I'm getting only 7.05's (keeps fingers crossed)

You the lucky one

, I hope it keeps working for you! I was on my PC with everything fine, then boom pure nonsense. I'm out now and will check it when I get home. Eh

[Ion] · Nov 17, 2012

I'm having no issues--nothing but 7.05s right now and my system is still pulling down a bunch of new WUs. Although the timing is worse, and there are longer idle periods on the card than there had been....ohwell.

manofthem · Nov 17, 2012

Good news, it all was sorted out when I got home just now: the Wus are cranking out like they should

I didn't think it would be working since I had seen the bad errors in EV right before leaving the house, but it's all good. :toast:

I'm not sure what started the whole debacle, but I'm glad it's done and hope it says put! :rockout:

KieX · Nov 17, 2012

Well.. if you want to do some reading.. this thread covers everything: http://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,34252_offset,40

It has been resolved as per the WCG tech Kevin Reed:

So here is the problem:

17/11/2012 02:58:23 | World Community Grid | [error] App version returned from anonymous platform project; ignoring
17/11/2012 02:58:23 | World Community Grid | [error] No app version found for app hcc1 platform windows_intelx86 ver 656 class ati_hcc1; discarding X0960073631347200608011011_4

A job gets assigned that uses the old app version.
The host is anonymous platform, so it ignores the app version sent.
The host can't find app version that matches the platform, version num and plan_class so it discards the job

The problem is that discarding the job does not report to the server that the client isn't running it. Thus the job is still assigned to you.

Next request to the server, you get sent the job again. This continues.

Even worse, each time the job is sent to you the deadline for the job is re-evaluated and possibly slightly increased. Thus it can potentially never pass its deadline.

When I started digging into this problem today, there were a lot of computers who were repeatedly being resent the same jobs.

This issue occurs when all three of app_info.xml is used, homogenous_app_version and resend_results.

Resolving this issue for the long run is going to be somewhat tricky. As a result, what I am doing now is changing the app_version on the workunits to all be at the 705 level. The new binaries are backward compatible so this shouldn't be an issue. This should return life to normal for now.

For now.. all HCC WU are sent with version 705 to avoid the problem found with app_info users.

manofthem · Nov 17, 2012

KieX said:
Well.. if you want to do some reading.. this thread covers everything: http://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,34252_offset,40

It has been resolved as per the WCG tech Kevin Reed:

For now.. all HCC WU are sent with version 705 to avoid the problem found with app_info users.

Great info, thanks! I'm glad it worked out, as I would have hated the downtime and the frustration of trying to figure something out.

[Ion] · Nov 17, 2012

Everything still going great on my cards--no more "hanging" WUs (although some do seem to be going a bit slow--something I need to keep an eye on).

t_ski · Nov 18, 2012

OK, so if I want to run two GPU WU's on an Nvidia card with a dual-thread CPU, can I do the same app_info.xml file with .5 count on the GPU?

manofthem · Nov 18, 2012

I'm now receiving "Project is temporarily shut down for maintainence," started reporting that around 4am (didn't see it til now). No more work is coming in, and I have like 153 completed tasks that are ready to report. Little frustrating. :banghead:

Daimus · Nov 18, 2012

manofthem said:
I'm now receiving "Project is temporarily shut down for maintainence," started reporting that around 4am (didn't see it til now). No more work is coming in, and I have like 153 completed tasks that are ready to report. Little frustrating.

Same here. Project is temporarily shut down for maintainence at 7:40 GMT.
It is very strange that there is no notification on the WCG forum.

manofthem · Nov 18, 2012

Daimus said:
Same here. Project is temporarily shut down for maintainence at 7:40 GMT.
It is very strange that there is no notification on the WCG forum.

Thanks for confirming, glad to know it's not just me. I'll check back later. Hopefully it'll kick back in, report what's done, and resume work :toast:

Edit: Project just kicked back in, things are ramming up again

KieX · Nov 18, 2012

t_ski said:
OK, so if I want to run two GPU WU's on an Nvidia card with a dual-thread CPU, can I do the same app_info.xml file with .5 count on the GPU?

Wow, at this rate you'll do in a few months what took me 3 years! :laugh:

But yes, it works no different. Just need to get the file with the NVIDIA/CUDA code instead and you're ready to go. :toast:

[Ion] · Nov 18, 2012

manofthem said:
I'm now receiving "Project is temporarily shut down for maintainence," started reporting that around 4am (didn't see it til now). No more work is coming in, and I have like 153 completed tasks that are ready to report. Little frustrating.

Ahhh, thanks for posting this! I got up this morning and I was distressed to see my output was so low and that the avg clock on the HD7770 had dropped from 960MHz to ~810.

But it's working again now

stinger608 · Nov 18, 2012

Got a great score on a HIS HD7870 the other day. Problem is, it will not be here until Wednesday. :-(

At least I sure hope it arrives then. LOL

Can't wait to see what kind of numbers that 7870 puts up.

Norton · Nov 18, 2012

stinger608 said:
Got a great score on a HIS HD7870 the other day. Problem is, it will not be here until Wednesday. :-(

At least I sure hope it arrives then. LOL

Can't wait to see what kind of numbers that 7870 puts up.

I'm running mine with single GPU wu's (no app_info file) and getting over 15k ppd from just the gpu wu's... :toast:

Running 3-4 wu's at once with the app_info tweak and you should be in the 50-60k+ ppd range with the rig :pimp:

t_ski · Nov 19, 2012

KieX said:
Wow, at this rate you'll do in a few months what took me 3 years!

But yes, it works no different. Just need to get the file with the NVIDIA/CUDA code instead and you're ready to go.

OK, so I tried to mess around with the app_info.xml file for my P4 and GTX280. This is what I tried to run:

Code:

<app_info>
 <app>
 <name>hcc1</name>
<user_friendly_name>Help Conquer Cancer</user_friendly_name>
 </app>
 <file_info>
 <name>wcg_hcc1_img_7.05_windows_intelx86__nvidia_hcc1</name>
 <executable/>
 </file_info>
 <file_info>
 <name>hcckernel.cl.7.05</name>
 <executable/>
 </file_info>
 <app_version>
 <app_name>hcc1</app_name>
 <version_num>705</version_num>
 <platform>windows_intelx86</platform>
<plan_class>nvidia_hcc1</plan_class>
 <avg_ncpus>1.0</avg_ncpus>
 <max_ncpus>1.0</max_ncpus>
 <coproc>
 <type>[COLOR="Red"]nvidia[/COLOR]</type>
 <count>.5</count>
 </coproc>
 <file_ref>
 <file_name>wcg_hcc1_img_7.05_windows_intelx86__nvidia_hcc1</file_name>
 <main_program/>
 </file_ref>
 <file_ref>
 <file_name>hcckernel.cl.7.05</file_name>
 <open_name>hcckernel.cl</open_name>
 </file_ref>
 </app_version>
</app_info>

All I got were "GPU missing" errors in the event log.

Any thoughts on what I might be missing?

EDIT: Nevermind - I looked at the first post again and saw that the co-processor type should be "Cuda" and not "nvidia." I'm going to give it another go now...

KieX · Nov 19, 2012

t_ski said:
OK, so I tried to mess around with the app_info.xml file for my P4 and GTX280. This is what I tried to run:

Code:

<app_info> <app> <name>hcc1</name> <user_friendly_name>Help Conquer Cancer</user_friendly_name> </app> <file_info> <name>wcg_hcc1_img_7.05_windows_intelx86__nvidia_hcc1</name> <executable/> </file_info> <file_info> <name>hcckernel.cl.7.05</name> <executable/> </file_info> <app_version> <app_name>hcc1</app_name> <version_num>705</version_num> <platform>windows_intelx86</platform> <plan_class>nvidia_hcc1</plan_class> <avg_ncpus>1.0</avg_ncpus> <max_ncpus>1.0</max_ncpus> <coproc> <type>[COLOR="Red"]nvidia[/COLOR]</type> <count>.5</count> </coproc> <file_ref> <file_name>wcg_hcc1_img_7.05_windows_intelx86__nvidia_hcc1</file_name> <main_program/> </file_ref> <file_ref> <file_name>hcckernel.cl.7.05</file_name> <open_name>hcckernel.cl</open_name> </file_ref> </app_version> </app_info>

All I got were "GPU missing" errors in the event log. Any thoughts on what I might be missing?

EDIT: Nevermind - I looked at the first post again and saw that the co-processor type should be "Cuda" and not "nvidia." I'm going to give it another go now...

Well spotted. Hopefully that sort that out.

t_ski · Nov 19, 2012

Yep, it sure did. I was running one CPU WU at 6 hours and one GPU wu at 20 minutes. Now I'm running two GPU WU's that take about 28 minutes each, so kicking out one every 14 minutes on that rig. Not bad, but I have some plans for that rig tomorrow.

manofthem · Nov 19, 2012

t_ski said:
Yep, it sure did. I was running one CPU WU at 6 hours and one GPU wu at 20 minutes. Now I'm running two GPU WU's that take about 28 minutes each, so kicking out one every 14 minutes on that rig. Not bad, but I have some plans for that rig tomorrow.

Come on t, give the rest of us a chance :roll:

Great work man! :toast:

System Name	Slick
Processor	Intel i5 2467M
Motherboard	Samsung Series 5 Ultrabook
Memory	8GB Corsair Value
Storage	Crucial M4 7mm 256GB
Display(s)	13"
Software	Windows 8 Pro x64

System Name	Niedersachsen / Ribe / Minsk
Processor	i3 3240 / i7-3520M / 4x Opteron 6376 @ 2.86GHz
Motherboard	BIOSTAR H61M / HP Q77 / Supermicro H8QG7
Cooling	Stock / Stock / 4x 1U G34
Memory	1x8GB / 2x4GB / 4x4GB
Video Card(s)	GTX260 / Intel HD 4000 / nVidia GT310
Storage	80GB Intel SSD / 256GB Intel SSD / 2x 60GB SSD (RAID1)
Display(s)	Dell 3007 + HP 2245w / 12.1" 1366x768 / None
Case	Antec NSK3480 / HP / Supermicro 1U
Audio Device(s)	Onboard
Power Supply	Enermax 500W / HP 130W / Supermicro Gold 1400W
Keyboard	IBM Model M
Software	Windows 7 (Niedersachsen/Ribe) / Linux Mint 17.2 (Minsk)

System Name	Slick
Processor	Intel i5 2467M
Motherboard	Samsung Series 5 Ultrabook
Memory	8GB Corsair Value
Storage	Crucial M4 7mm 256GB
Display(s)	13"
Software	Windows 8 Pro x64

Processor	3900X @ 4.0
Motherboard	Asus ROG Strix X570-E
Cooling	DeepCool Castle 360EX
Memory	G Skill Trident Z Neo 32GB 3600
Video Card(s)	RX 5700 XT Pulse
Storage	Sabrent Rocket Q 1TB
Display(s)	LG 34UC88
Case	Thermaltake P3
Power Supply	Super Flower Leadex III 750w
Mouse	Logitech G900
Keyboard	G Skill KM570 MX Silver
Software	Windows 10 Pro

System Name	Niedersachsen / Ribe / Minsk
Processor	i3 3240 / i7-3520M / 4x Opteron 6376 @ 2.86GHz
Motherboard	BIOSTAR H61M / HP Q77 / Supermicro H8QG7
Cooling	Stock / Stock / 4x 1U G34
Memory	1x8GB / 2x4GB / 4x4GB
Video Card(s)	GTX260 / Intel HD 4000 / nVidia GT310
Storage	80GB Intel SSD / 256GB Intel SSD / 2x 60GB SSD (RAID1)
Display(s)	Dell 3007 + HP 2245w / 12.1" 1366x768 / None
Case	Antec NSK3480 / HP / Supermicro 1U
Audio Device(s)	Onboard
Power Supply	Enermax 500W / HP 130W / Supermicro Gold 1400W
Keyboard	IBM Model M
Software	Windows 7 (Niedersachsen/Ribe) / Linux Mint 17.2 (Minsk)

System Name	My i7 Beast
Processor	Intel Core i7 6800K
Motherboard	Asus X99-A II
Cooling	Nickel-plated EK Supremacy EVO, D5 with XSPC Bayres & BIX Quad Radiator
Memory	4 x 8GB EVGA SuperSC DDR4-3200
Video Card(s)	EVGA 1080 SuperClocked
Storage	Samsung 950 Pro 256GB m.2 SSD + 480GB Sandisk storage SSD
Display(s)	Three Asus 24" VW246H LCD's
Case	Silverstone TJ07
Audio Device(s)	Onboard
Power Supply	Corsair AX1200
Keyboard	Corsair K95
Software	Windows 10 x64 Pro

System Name	Cruncher
Processor	AMD FX-8370
Motherboard	Asus Crosshair V Formula
Cooling	Thermalright Silver Arrow SB-E
Memory	2x4 GB Corsair XMS3 2000
Video Card(s)	Sapphire HD7870+Powercolor HD7850 CrossFire
Storage	Corsair ForceGT 90 SSD,WD Black 500
Display(s)	Acer 24'' 2ms
Case	Cooler Master HAF 932
Audio Device(s)	onboard
Power Supply	Thermaltake Grand 1050w 80+Gold
Software	Win 7 SP1 Ult x64

System Name	2023 Ryzenfall
Processor	AMD Ryzen 5800X
Motherboard	Asus ROG STRIX B550-F Gaming Wifi
Cooling	SilverStone AH240 AIO
Memory	32 gigs G.Skill TridentZ NEO DDR4
Video Card(s)	EVGA GTX 1080 FTW Hybrid Gaming
Storage	Dual Samsung 980 Pro M2 NVME 4.0
Display(s)	Overlord 27" 2560 x 1440
Case	Corsair Air 540
Audio Device(s)	On board
Power Supply	Seasonic modular 850 watt Platinum
Software	Windows 10 Pro

System Name	Main PC- Gamer- Main Cruncher/Folder and too many crunching/folding rigs
Processor	Ryzen 5900X- Ryzen 5950X- Ryzen 3950X and etc...
Motherboard	Asrock X570 Extreme4- MSI X570S Tomahawk MAX WiFi- MSI B450M Bazooka Max and etc...
Cooling	Noctua NH-U14S (dual fan)- EK 360 AIO with push/pull fans- Corsair H115i RGB Pro XT and etc...
Memory	2x16GB GSkill FlareX 3200/c14- 4x8GB Corsair Vengeance 3600/c16- 2x16GB Team 3600/c18 and etc..
Video Card(s)	MSI Gaming RX 6800- Asus RTX 3070 TUF OC- MSI Ventus GTX 1660Ti and etc...
Storage	Main PC (1TB WD SN850- 2TB PNY CS 3040- 2TB Seagate Firecuda) and etc...
Display(s)	Main PC (2x24" Dell UltraSharp U2414H)
Case	Phanteks P600s- Seasonic Q704- Fractal Meshify C and etc...
Audio Device(s)	Logitech Z625 THX 2.1 speakers
Power Supply	EVGA 750 G3- SeaSonic DGC 750- EVGA P2 850 and etc...
Mouse	G300s
Keyboard	Corsair K65
VR HMD	N/A
Software	Windows 10 Pro or Ubuntu
Benchmark Scores	Why sit on the Bench when you can get in the game and Crunch!!!

Guide to running multiple GPU work units

WCG Team Assistant

WCG-TPU Team All-Star!

WCG Team Assistant

WCG-TPU Team All-Star!

Former Staff

WCG-TPU Team All-Star!

WCG Team Assistant

WCG-TPU Team All-Star!

WCG-TPU Team All-Star!

WCG Team Assistant

Former Staff

WCG-TPU Team All-Star!

WCG-TPU Team All-Star!

WCG Team Assistant

Dedicated TPU Cruncher & Folder

Moderator - Returning from the Darkness

Former Staff

Former Staff

WCG-TPU Team All-Star!