hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoD: https://developers.google.com/season-of-docs/
nikunj has joined #ste||ar
K-ballo1 has joined #ste||ar
K-ballo has quit [Ping timeout: 258 seconds]
K-ballo1 is now known as K-ballo
jaafar has joined #ste||ar
jaafar has quit [Ping timeout: 258 seconds]
hkaiser has joined #ste||ar
hkaiser has quit [Quit: bye]
mdiers_ has quit [Ping timeout: 245 seconds]
nikunj has quit [Remote host closed the connection]
<simbergm> hkaiser: sorry, here now
<jgurhem> Hi ! simbergm I had a few mistakes in my computation of the indices. I also had a few issues with copies and constructors for my partition_data.
<simbergm> all right :) and you managed to run it on multiple nodes without problems?
<jgurhem> I'm now trying to launch my HPX application in a distributed env but I don't know what to pass to HPX
<simbergm> ah, that answers my question...
<jgurhem> That's the issue now ;(
<jgurhem> I'm trying to work with --hpx:nodefile
<jgurhem> and mpirun
<simbergm> I think sadly we don't have any documentation that explicitly tells you that
<jgurhem> There is the description of the command line options...
daissgr has joined #ste||ar
<simbergm> jgurhem: right, if that's good enough for you I'm happy
<simbergm> I know many people would be unhappy with it ;)
<jgurhem> simbergm: I'm not happy with it !
<jgurhem> I does not work
<jgurhem> I'm running 48 times the same app instead of 16 threads on 3 nodes
<jgurhem> Still some work
<jgurhem> simbergm: In the post you show me hkaiser: is mentionning a second option with mpirun with special build instructions. Do you know them ?
<simbergm> jgurhem: I haven't tried it myself but I think that's supposed to work the same as with any other batch environment that we support, i.e. without you having to do anything
<simbergm> how are you launching it at the moment to get 48 processes?
<simbergm> or you use hpxrun.py (it's in the tools folder)
<simbergm> you have mpi enabled, no?
<simbergm> or the mpi parcelport, to be precise
<simbergm> HPX_WITH_PARCELPORT_MPI=ON
<jgurhem> I'm compiling HPX with HPX_WITH_PARCELPORT_MPI=ON rigth now. I'll try again when it's done
<jgurhem> I was using mpirun in a job submitted to LoadLeveler
<simbergm> ok, mpirun is probably launching one process per thread rather than one process per node by default, so you might have to force it to change that
<jgurhem> There is no hpxrun.py file in hpx/tools
<simbergm> don't know the command line parameters for that though, sorry :/
<simbergm> right, sorry
<simbergm> if you've installed hpx it's in the bin folder
<simbergm> (forgot that it's generated)
<simbergm> or in the bin folder of the build directory actually...
heller has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]
heller has joined #ste||ar
K-ballo has quit [Quit: K-ballo]
K-ballo has joined #ste||ar
daissgr has quit [Ping timeout: 245 seconds]
<jgurhem> simbergm: I managed to run my app on 3 nodes with 16 cores each. I got 3 localities and 16 threads each. Seems OK ?
<jgurhem> However, I ran into a seg fault. I'm trying to investigate
rori has joined #ste||ar
<simbergm> jgurhem: yep, that seems ok
<simbergm> do you get segfaults also running on just one node?
<jgurhem> simbergm: Nope, it works fine on one node
<jgurhem> I got that error : https://pastebin.com/ZTv7dWV7
<jgurhem> with boost/1.69.0/release/include/boost/smart_ptr/intrusive_ptr.hpp:193: T& boost::intrusive_ptr<T>::operator*() const [with T = hpx::naming::detail::id_type_impl]: Assertion `px != 0' failed.
<simbergm> jgurhem: again, that's a very generic error so it's hard to give any specific advice
<simbergm> you'd have to try to figure out in which part of the program this happens
aserio has joined #ste||ar
eschnett has joined #ste||ar
diehlpk has joined #ste||ar
<diehlpk> simbergm, Do you know when John is back?
<simbergm> diehlpk: not exactly, but I think he'll be away most of july at least
diehlpk has quit [Ping timeout: 250 seconds]
mdiers_ has joined #ste||ar
akheir has joined #ste||ar
aserio has quit [Ping timeout: 276 seconds]
aserio has joined #ste||ar
eschnett has quit [Quit: eschnett]
eschnett has joined #ste||ar
eschnett has quit [Quit: eschnett]
eschnett has joined #ste||ar
hkaiser has joined #ste||ar
<hkaiser> hey aserio, yt?
<hkaiser> aserio: whenever you read this: could you please ask Steve to have a look at my coment in #994 (Phylanx)? he probably didn't see my email...
<aserio> hkaiser: here
<aserio> yes
<hkaiser> thanks! off again - my Mom is waiting...
<aserio> np!
hkaiser has quit [Client Quit]
aserio has quit [Ping timeout: 246 seconds]
jaafar has joined #ste||ar
eschnett has quit [Quit: eschnett]
eschnett has joined #ste||ar
rori has quit [Quit: WeeChat 1.9.1]
aserio has joined #ste||ar
jaafar has quit [Quit: Konversation terminated!]
jaafar has joined #ste||ar
jaafar has quit [Client Quit]
jaafar has joined #ste||ar
eschnett has quit [Quit: eschnett]
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 276 seconds]
aserio1 is now known as aserio
K-ballo1 has joined #ste||ar
K-ballo has quit [Ping timeout: 245 seconds]
K-ballo1 is now known as K-ballo
K-ballo has quit [Ping timeout: 246 seconds]
aserio has quit [Ping timeout: 272 seconds]
aserio has joined #ste||ar
aserio has quit [Quit: aserio]
K-ballo has joined #ste||ar
nikunj has joined #ste||ar