hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu
tufei has quit [Remote host closed the connection]
tufei has joined #ste||ar
Yorlik__ has joined #ste||ar
Yorlik_ has quit [Ping timeout: 264 seconds]
hkaiser has quit [Quit: Bye!]
tufei has quit [Remote host closed the connection]
tufei has joined #ste||ar
tufei has quit [Remote host closed the connection]
tufei has joined #ste||ar
tufei has quit [Remote host closed the connection]
tufei has joined #ste||ar
tufei has quit [Read error: Connection reset by peer]
tufei has joined #ste||ar
K-ballo1 has joined #ste||ar
K-ballo has quit [Ping timeout: 256 seconds]
K-ballo1 is now known as K-ballo
tufei_ has joined #ste||ar
tufei has quit [Ping timeout: 255 seconds]
tufei_ has quit [Remote host closed the connection]
tufei_ has joined #ste||ar
apop has joined #ste||ar
<apop> Hi, I've been using HPX on Slurm clusters for a while and now am looking at running multiple jobs at the same time (e.g., SLURM job matrix) on the same cluster, but I can't seem to be able to run more than one at a time. SLURM is able to launch them, but only one (or none in some cases) runs and all other crash with the following error. I've tried to switch to MPI Parcelport and disable TCP, but still doesn't work. Is this a known issue? Any
<apop> fixes? Thanks!
<apop> the bootstrap parcelport (tcp) has failed to initialize on locality 0:
<apop> <unknown>: HPX(network_error),
<apop> bailing out
<apop> terminate called without an active exception
<apop> srun: error: queue1-dy-m5a2xlarge-1: task 0: Exited with exit code 255
<apop> the bootstrap parcelport (tcp) has failed to initialize on locality 4294967295:
<apop> <unknown>: HPX(network_error),
<apop> bailing out
<apop> terminate called without an active exception
hkaiser has joined #ste||ar
Yorlik__ has quit [Read error: Connection reset by peer]
<K-ballo> hkaiser: did you re-add me as alternate?
<hkaiser> K-ballo: unintenionally - this website is awful
<hkaiser> you've never been removed
<K-ballo> I may end up joining the italian NB now
<hkaiser> K-ballo: ok, just let me know
hkaiser has quit [Quit: Bye!]
hkaiser has joined #ste||ar
<apop> I can add that as an issue on github if it's not appropriate for this channel - lmk!
<hkaiser> apop: please do
<apop> ok, thanks!
hkaiser has quit [Ping timeout: 260 seconds]
hkaiser has joined #ste||ar
hkaiser has quit [Quit: Bye!]
K-ballo1 has joined #ste||ar
K-ballo has quit [Ping timeout: 264 seconds]
K-ballo1 is now known as K-ballo
tufei__ has joined #ste||ar
tufei_ has quit [Remote host closed the connection]
hkaiser has joined #ste||ar
tufei__ has quit [Remote host closed the connection]