hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020
<K-ballo> (you may note from the above that default initialization is not necessarily here, and I could just reserve and push_back as I go)
<dd> That is essentially how my loop looks except I have an intermediate call to dataflow that combines the two futures
<dd> yeah I was thinking that push_back might solve our problem
<K-ballo> I don't think it will
<K-ballo> you'd just try to use garbage memory instead of an uninitialized future
<K-ballo> if your loop truly did behave as that, then you wouldn't be tripping over uninitialized futures
<dd> ok well I think you gave me what I need to keep working on this - much appreciated
<dd> BTW it interestingly works for small enough meshes
<K-ballo> if you turn it into a minimal test case reproducing the exact problem without all the complexity, I'm sure people here would have a look
<dd> ok thanks again - I will keep debugging and try to come up with a reproducer if I can't sort it out
kale[m] has quit [Ping timeout: 260 seconds]
kale[m] has joined #ste||ar
akheir has quit [Quit: Leaving]
hkaiser has quit [Quit: bye]
kale[m] has quit [Ping timeout: 260 seconds]
nan11 has quit [Remote host closed the connection]
dd has quit [Ping timeout: 245 seconds]
bita__ has quit [Ping timeout: 260 seconds]
nikunj97 has joined #ste||ar
Nikunj__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 244 seconds]
Nikunj__ has quit [Read error: Connection reset by peer]
<zao> gonidelis[m]: the exact issue I said it was... smh
Nikunj__ has joined #ste||ar
nikunj97 has joined #ste||ar
Nikunj__ has quit [Ping timeout: 260 seconds]
kale[m] has joined #ste||ar
kale[m] has quit [Ping timeout: 256 seconds]
nikunj97 has quit [Read error: Connection reset by peer]
nikunj97 has joined #ste||ar
Nikunj__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 260 seconds]
kale[m] has joined #ste||ar
kale[m] has quit [Ping timeout: 260 seconds]
nikunj97 has joined #ste||ar
kale[m] has joined #ste||ar
Nikunj__ has quit [Ping timeout: 260 seconds]
Nikunj__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 260 seconds]
hkaiser has joined #ste||ar
kale[m] has quit [Read error: Connection reset by peer]
kale[m] has joined #ste||ar
kale[m] has quit [Ping timeout: 264 seconds]
kale[m] has joined #ste||ar
Nikunj__ has quit [Quit: Leaving]
weilewei has joined #ste||ar
dd has joined #ste||ar
nan11 has joined #ste||ar
weilewei has quit [Remote host closed the connection]
weilewei has joined #ste||ar
kale[m] has quit [Read error: Connection reset by peer]
kale[m] has joined #ste||ar
karame_ has joined #ste||ar
kale[m] has quit [Read error: Connection reset by peer]
kale[m] has joined #ste||ar
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
akheir has joined #ste||ar
rtohid has joined #ste||ar
<weilewei> HPX is mentioned in Kokkos training tutorial
<diehlpk_work_> June 29, 2020 - July 3, 2020 -> First GSoC evaluations next week
<jbjnr> hkaiser: yt?
<hkaiser> jbjnr: here
<jbjnr> agas - service_mode hosted and service_mode bootstrap - I've got a problem with bootstrapping the libfabric stuff and I'm curious how agas decides which mode is set. Can you elucidate please?
<jbjnr> what I want to do is start N worker nodes and say --hpx:agas=root_node:port - but when I use --hpx:agas on the command line, it thinks the worker is the root node and not that I want to connect to agas on the root node.
<hkaiser> mode 'bootstrap' means it's locality zero, otherwise 'hosted'
<hkaiser> --hpx:agas shouldn't have any relation to the service mode
<hkaiser> it just specifies where the bootstrapped agas instance lives
<jbjnr> my problem is that when I use "--hpx:agas=localhost:7910 --hpx:localities=2 --hpx:worker" is sets the service mode to bootstrap even though I'm a worker
<hkaiser> add a --hpx:node=0/1 accordingly
<hkaiser> then --hpx:worker is not needed
<jbjnr> you can't use hpx:node when you use hpx:agas
<jbjnr> hpx::init: std::exception caught: Command line option --hpx:node is not compatible with --hpx:agas
<jbjnr> hkaiser: I know how it used to work.
<hkaiser> doesn't it work as described anymore?
<jbjnr> my libfabric stuff does not work any more and I'm trying to find out what has been broken
<jbjnr> and I have no idea why hpx:node is 'incompatible' with hpx:agas
<hkaiser> do those instructions work with tcp?
<jbjnr> no idea. not interested in tcp
<hkaiser> generally I wouldn't exclude the possibility that we have broken tings during startup
<hkaiser> jbjnr: please don't be annoyed
<jbjnr> why is hpx:node imcompatible with hpx:agas?
<hkaiser> finding out whether things still work with tcp would at least tell us whether it's a general problem or just something with the libfabric pp
<hkaiser> jbjnr: I don't know from the top of my head - need to look at the code and think about it
<hkaiser> most probably because using hpx:agas and hpx:node might create ambiguities, but I'm not sure
<jbjnr> ok. just wanted to know about the service mode stuff. it is as I expected and there are new bugs
<hkaiser> or it's simply an invalid restriction
<jbjnr> ^this
<jbjnr> imho
<jbjnr> but I suspect there was a reason for it once upon a time
<hkaiser> indeed
<ms[m]> weilewei: nice! are you attending? is it public?
bita__ has joined #ste||ar
<weilewei> ms[m] yes I am attending. I think you need to register to get zoom password and send email to celmont@sandia.gov: https://github.com/kokkos/kokkos-tutorials/issues/36
<weilewei> it's like half public/private. I think their target audience is summer students at Sandia lab
<ms[m]> weilewei: ok, thanks
<ms[m]> then just pass on all the gossip to here ;)
weilewei has quit [Remote host closed the connection]
dd has quit [Remote host closed the connection]
rtohid has quit [Ping timeout: 245 seconds]
karame_ has quit [Remote host closed the connection]
rtohid has joined #ste||ar
rtohid has quit [Remote host closed the connection]
rtohid has joined #ste||ar
<akheir> what happened to github? it is ugly!
<akheir> is it just me?
<zao> akheir: Hehe, round and nice.
<zao> akheir: I saw you ran into the GCC fixincludes bug on your cluster too, isn't it fun?
<zao> We had it in EasyBuild a while ago, I linked our thread last night in here.
karame_ has joined #ste||ar
<zao> In short, GCC is quite tied to the glibc version on the host OS, so for minor OS upgrades you may need to rebuild it.
<akheir> zao: yeah. I didn't know the name, but it was confusing
<akheir> oh, nasty. it is ok now since I only have two version but later on when the number grows could be difficult to handle
<akheir> zao: how about the libraries compiled this that gcc? should I recompile my boost as well?
<zao> In this particualr case, I don't believe it's required.
<akheir> good to know. I did it to be safe though
<zao> How do your build the software stack on the cluster, mostly manual or EB/spack/something?
<zao> We used to build everything manually for four different compiler vendors with just README.sysop files with vague instructions on how to build something :D
<akheir> zao: I gave up EB on old cluster. It's dependency management was headache. I have my on set of bash scripts. does it for me from download to creating lmod module
<zao> We find it quite nice for the vast bulk of software that our researchers need to run, but for a department cluster more aimed at development, maybe not quite as good.
<akheir> yeah. tell it to install openmpi it goes and installs two versions of gcc first and then compiles openmpi. lol
karame_ has quit [Remote host closed the connection]
<zao> The relative isolation from the underlying OS is nice when you want some semblance of things working the same across systems.
<zao> Most of the horrors I run into when trying it on weirdo distros is to get the system-ish things going, the rest tend to be smooth sailing.
nikunj97 has joined #ste||ar
nan11 has quit [Remote host closed the connection]
nan11 has joined #ste||ar
Nikunj__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 260 seconds]
Nikunj__ has quit [Read error: Connection reset by peer]
Nikunj__ has joined #ste||ar
Nikunj__ has quit [Read error: Connection reset by peer]
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
kale[m] has quit [Ping timeout: 246 seconds]
kale[m] has joined #ste||ar
<nikunj> hkaiser: yt?
<bita__> thanks to Al, we were trying to build Phylanx on Rostam. Using yesterday's hpx master, Phylanx cannot be built: https://gist.github.com/taless474/05fb593094a2eed051c94540927cf6d3
<bita__> Phylanx image is good, as I just built it on docker
<bita__> Any suggestions?
<nikunj> hkaiser: I just noticed that time increases exponentially when using hpx::lcos::channel for distributed send/receive with increasing number of hpx threads. So if I keep 1 hpx thread per node and use lcos communication, it is blazing fast. But if I use 64 hpx threads per node, it is terribly slow for the same amount of send/receive compared to 1 hpx thread.
<nikunj> hkaiser: is it an expected behavior? If yes, any workaround if only 1 thread invokes set and get functions?
<hkaiser> what's your idle rates?
<hkaiser> bita__: missing header ?
<hkaiser> <memory> or <string>?
<bita__> well I am using master
<bita__> I see that hpx master yesterday was failing for a few hours. How can I get the lastest stable version?
<hkaiser> stable tag?
<bita__> yes
<hkaiser> there is a tag 'stable' that is the latest commit that passed testing
<bita__> Steve's Phylanx branch was successfully built about 3 hours ago. I just don't know what I miss
<bita__> thank I will try that
<hkaiser> bita__: do you build with c++14?
<bita__> I have this -DPHYLANX_WITH_CXX17=ON, so I would say no
<hkaiser> well, c++17 should work as well
mariella[m] has joined #ste||ar
kale[m] has quit [Ping timeout: 260 seconds]
kale[m] has joined #ste||ar
rtohid has left #ste||ar [#ste||ar]
weilewei has joined #ste||ar
<weilewei> Does github have new user interface from today?
<hkaiser> weilewei: I think you need to explicitly enabled it
<weilewei> hkaiser I didn't do anything at all, and then the github repo page has a new look
<hkaiser> ok
<hkaiser> I had it enabled for some time, they might have made it broadly available now
<weilewei> I see, I felt not familiar with the new look