hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu
Yorlik_ has joined #ste||ar
Yorlik__ has quit [Ping timeout: 252 seconds]
K-ballo1 has joined #ste||ar
K-ballo has quit [Ping timeout: 252 seconds]
K-ballo1 is now known as K-ballo
hkaiser has quit [Quit: Bye!]
Guest6650 has joined #ste||ar
K-ballo has quit [Ping timeout: 248 seconds]
K-ballo has joined #ste||ar
Guest6650 has quit [Quit: Client closed]
hkaiser has joined #ste||ar
hkaiser has quit [Quit: Bye!]
hkaiser has joined #ste||ar
beojan has joined #ste||ar
<beojan>
Hi. I've now gotten much of a proof of concept implemented on our project porting the ATLAS experiment's framework to use HPX. I've run into a couple of issues though:
<beojan>
1. Our application consists of the process running on the master node submitting large tasks to worker nodes using an HPX action. On the worker the action submits the task into our own scheduler then returns, and our scheduler (running on a dedicated OS thread) splits it up and uses `hpx::post` to submit a number of HPX tasks.
<beojan>
The aforementioned action also waits for all previous tasks running on the worker (i.e. submitted by our scheduler) to finish under certain conditions (specifically before starting the second task, and thereafter when we run into a limit we've set on the number of tasks that can be in a worker's queue at any one time).
<beojan>
I've found that if I don't have enough HPX threads the process appears to deadlock, and never finishes.
<beojan>
Enough here is 7 (occassional stalls with 6 threads, and it's pretty much guaranteed to stall with 4).
K-ballo has quit [Ping timeout: 260 seconds]
K-ballo has joined #ste||ar
<beojan>
Is there some way to query HPX to determine what (if anything) is running on each thread?
<hkaiser>
beojan: difficult to tell from here what's happening
<beojan>
In fact, now it's only freezing if I run with a single node
<hkaiser>
even then it shouldn
<hkaiser>
not do that
<hkaiser>
beojan: we've not seen any hangs when using async/actions for a long time, I'm almost certain this isn't an HPX issue
hkaiser has quit [Quit: Bye!]
Yorlik_ is now known as Yorlik
beojan has quit [Quit: Konversation terminated!]
hkaiser has joined #ste||ar
diehlpk_work has joined #ste||ar
diehlpk_work has quit [Remote host closed the connection]