hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
eschnett has joined #ste||ar
K-ballo has quit [Quit: K-ballo]
hkaiser has quit [Quit: bye]
david_pfander has joined #ste||ar
K-ballo has joined #ste||ar
hkaiser has joined #ste||ar
eschnett has quit [Quit: eschnett]
aserio has joined #ste||ar
hkaiser has quit [Quit: bye]
hkaiser has joined #ste||ar
daissgr has quit [Quit: WeeChat 1.9.1]
<jbjnr__>
heller: yt?
<heller>
jbjnr__: what's up?
<jbjnr__>
I was asking hkaiser last night about the MPI parcelport and if the MPI rank is always the same as the locality_id
<jbjnr__>
I stopped worrying about it, but now I'm concerned again
<jbjnr__>
as the libfabric rank is frequently different from the locality_id
<jbjnr__>
and I don't like it
<jbjnr__>
is there any mechanism to match them up in the simple case that we are not expecing workers to join after bootup
<jbjnr__>
I had a quick look at the BBB code and address naming stuff, but I want to avoid going through it all !
<jbjnr__>
heller: ran away!
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 268 seconds]
aserio1 is now known as aserio
<heller>
jbjnr__: yes, with the mpi parcelport, we use the rank
<heller>
For everything else, we give them away in a first come first served basis
<jbjnr__>
can you point me to where agas takes the rank and assigns the locality_id? I didn't see it.
<heller>
You can modify that though
<heller>
If you give me a second or two
<jbjnr__>
thanks. No hurry
<jbjnr__>
it shouldn't actually matter
<jbjnr__>
but it would be nice for consistency to have them the same
<jbjnr__>
I actually want to do it differently - each rank contacts agas using the libfabric connectionless mode and then we generate an address vector - using the order they reach agas = random - the easiest thing for me to do is use the address vector index as the rank (this works lovely as I now use TABLE type instead of MAP type for the AV)
<jbjnr__>
so I don't really care what slurm thinks the rank is
<jbjnr__>
I will set the config from my bootup if I can and see if that is picked up correctly by the rest of the code
<jbjnr__>
thanks a bundle. I missed that config set/get
<jbjnr__>
I suppose I could insert the address using the slurm config index
<jbjnr__>
then it would be consistent all over. I'll try that first
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 250 seconds]
aserio1 is now known as aserio
aserio has quit [Ping timeout: 250 seconds]
aserio has joined #ste||ar
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 245 seconds]
mreese3 has joined #ste||ar
<mreese3>
Can HPX serialize std::unordered_maps?
<K-ballo>
yes
<mreese3>
Okay, thanks!
hkaiser has quit [Quit: bye]
nikunj has joined #ste||ar
<nikunj>
Hey! Can anyone tell me why running this code with hpx leads to deadlock while running it normally does not? https://pastebin.com/9THdsHTN
<zao>
No idea about your deadlocking, but you should never really reseed your PRNG while running.
<nikunj>
I'll change that
<zao>
nikunj: std::cin.get() blocks, which might be not-cool on a HPX thread.
<nikunj>
perhaps hpx::cin.get() then?
<zao>
Block indefinitely, I should say.
<zao>
I've got no idea :)
<nikunj>
zao: that is meant to block btw, that is just to end the infinite execution from user end
<zao>
Yes, the problem is that HPX might expect work to either yield via a HPX synchronization primitive or complete.
<nikunj>
aah
<nikunj>
that might be the underlying issue then
<zao>
Loops of indeterminate duration that don't cause the runtime to switch tasks or OS-blocking operations grind HPX to a halt.
<zao>
Your loops polling atomic variables don't really let the runtime do anything either, unless you happen to do something that gets the runtime to reconsider whether you should be actively running or not.
<zao>
I don't know if HPX has any "yield if you feel like it" functionality.
<nikunj>
that implementation of semaphores with atomic variable was required in the assignment
<nikunj>
but I think I realize why it deadlocked
<nikunj>
thanks for the help!
<zao>
You could probably cheat by having enough OS threads servicing HPX, so that you get at least an 1:1 mapping of HPX tasks to threads.
<zao>
In the real world, you might have different executors or something where you could run things that are expected to be long-running or blocking, or have some other requirement.
<nikunj>
I see
hkaiser has joined #ste||ar
aserio has quit [Quit: aserio]
nikunj has quit [Ping timeout: 256 seconds]
mreese3 has quit [Read error: Connection reset by peer]