aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
mcopik has quit [Ping timeout: 248 seconds]
parsa has quit [Quit: Zzzzzzzzzzzz]
diehlpk has quit [Ping timeout: 240 seconds]
EverYoung has quit [Ping timeout: 276 seconds]
K-ballo has quit [Quit: K-ballo]
vamatya has joined #ste||ar
diehlpk has joined #ste||ar
eschnett has quit [Quit: eschnett]
vamatya has quit [Ping timeout: 260 seconds]
eschnett has joined #ste||ar
diehlpk has quit [Remote host closed the connection]
patg has joined #ste||ar
hkaiser has quit [Quit: bye]
parsa has joined #ste||ar
eschnett has quit [Quit: eschnett]
vamatya has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
parsa has joined #ste||ar
parsa has quit [Client Quit]
parsa has joined #ste||ar
parsa has quit [Client Quit]
parsa has joined #ste||ar
parsa has quit [Client Quit]
parsa has joined #ste||ar
parsa has quit [Client Quit]
parsa has joined #ste||ar
parsa has quit [Client Quit]
patg has quit [Quit: See you later]
Matombo has joined #ste||ar
Matombo has quit [Ping timeout: 268 seconds]
<github> [hpx] biddisco pushed 1 new commit to terminated_threads: https://git.io/v7qAr
<github> hpx/terminated_threads 435aa82 John Biddiscombe: Only boolean config options use HPX_WITH_XXX and HPX_HAVE_XXX prefixes
bikineev has joined #ste||ar
vamatya has quit [Ping timeout: 260 seconds]
david_pfander has joined #ste||ar
mcopik has joined #ste||ar
<github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/v7mLk
<github> hpx/gh-pages b2f6a6f StellarBot: Updating docs
bikineev has quit [Remote host closed the connection]
Matombo has joined #ste||ar
hkaiser has joined #ste||ar
bikineev has joined #ste||ar
Matombo has quit [Remote host closed the connection]
bikineev has quit [Remote host closed the connection]
mcopik_ has joined #ste||ar
mcopik_ has quit [Client Quit]
K-ballo has joined #ste||ar
mcopik_ has joined #ste||ar
mcopik_ has quit [Client Quit]
Matombo has joined #ste||ar
eschnett has joined #ste||ar
hkaiser has quit [Quit: bye]
diehlpk_work has joined #ste||ar
taeguk has joined #ste||ar
<taeguk> Excuse me, I have a question about 'detail' namespace and unnamed namespace.
<taeguk> Because 'block' and 'block_manager' are so common naming, I worry duplication of namings.
<taeguk> So, I put 'struct block' and 'struct block_manager' into unnamed namespace.
<K-ballo> unnamed namespaces don't affect naming, just linkage
<taeguk> K-ballo: Yes, you're right. because of that fact, I used unnamed namespace for 'struct block' and 'struct block_manager'.
<K-ballo> taeguk: you'll have to explain what the intention is, because it makes no sense to me.. unnamed namespace won't help with duplication of namings when those are a problem
<diehlpk_work> thundergroudon[m, taeguk , ABresting Your evaluation is still missing
<diehlpk_work> Please do it until today
<zao> If two HPX headers uses the same name in two separate unnamed namespaces, you'll still clash.
<zao> (in the same TU)
<taeguk> Oh
<K-ballo> yeah, and if there's a struct foo {}; at the outer namespace, it will either hide the nested ones or be a conflict, depending on include order
<taeguk> K-ballo: zao: I misunderstand unnamed namespace.
<taeguk> My usage is incorrect :(
<zao> Upside of things, you've learned something :)
<zao> (I had to verify my understanding of them just the other day, w.r.t. visibility of names)
<taeguk> zao: K-ballo: very thank you :)
<taeguk> diehlpk_work: Okay, I'll do 2nd evaluation soon.
akheir has joined #ste||ar
hkaiser has joined #ste||ar
aserio has joined #ste||ar
<jbjnr> heller: did you see the libfabric email about standardization. Interesting indeed.
<heller> jbjnr: yes
<jbjnr> this could be my chance to poush rma_objects!
<heller> :D
<jbjnr> ^push
<heller> I am fighting scalable endpoints right now
<heller> pain in the fucking ass
<jbjnr> good for you!
<jbjnr> (the fighting bit, not the pita bit)
<heller> and guess what ... the test ain't working
<jbjnr> Glad it's you and not me for once
<jbjnr> :)
<jbjnr> (sorry, I believe is some kind of schadenfreuden or something)
<heller> ;)
<heller> I mean the stuff that's included in fabtests
<hkaiser> heller: we'll get a small power 8 node (8 cores) here and will add it to rostam
Matombo has quit [Remote host closed the connection]
Kiril_ has joined #ste||ar
<Kiril_> Hello. I was playing with the pingpong example in examples/quickstart today, and I can't seem to make it work for distributed runs. It works for single-node runs though. Can anyone give me some help?
<jbjnr> Kiril_: what command line are you using to start the binaries on each node?
<Kiril_> So I have Slurm set up for a small 2-node setting
<jbjnr> does hello_world run on 2 nodes?
<Kiril_> when I run single-node run "srun -n 2 bin/pingpong", it runs fine
<jbjnr> ok
<jbjnr> (I'm surprised though, cos 2 binaries on the same node usually fail!)
<jbjnr> did you compile with the MPI parcelport, or only tcp?
<Kiril_> okay, hello_world also hangs ... it seems something is not right with my setup
<Kiril_> only TCP
<Kiril_> I try to avoid MPI (not important here why)
<Kiril_> do I need to pass some flag for only TCP settings?
Matombo has joined #ste||ar
<jbjnr> tcp should be ok, but the MPI parcelport 'knows' about slurm and gets settings from it, I can't remember if the tcp one does.
<jbjnr> it may help
<Kiril_> No -- let me try. It has nothing to do with pingpong then, something with my setup
<hkaiser> jbjnr: the tcp one does as well
<jbjnr> if you can ssh into two nodes - try launching the binaries by hand, using the --hpx:console on one and hpx::worker on the other, and pas the hpx:
<jbjnr> :oops
<Kiril_> the binaries launch via Slurm
<jbjnr> pass the hpx:agas and hpx:hpx ip addresses
<Kiril_> I see a hanging pingpong process
<jbjnr> kill any hanging jobs too. they hold onto the port and stop the next one working
<Kiril_> let me read through that setup ...
<jbjnr> yup
eschnett has quit [Quit: eschnett]
eschnett has joined #ste||ar
aserio has quit [Ping timeout: 246 seconds]
<Kiril_> I can't seem to make it work. I managed to pass the "hpx:hpx" and "hpx:agas" options with the heartbeat/heartbeat_console example and that worked. For pingpong, I also specify "hpx:localities=2" in addition to the previous options, otherwise the pingpong runs in shared memory and finishes. But that hangs
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
<Kiril_> I run on one node "bin/pingpong --hpx:localities=2 --hpx:agas=192.168.137.2:7910 --hpx:hpx=192.168.137.1:7910" and on another bin/pingpong --hpx:localities=2 --hpx:agas=192.168.137.2:7910 --hpx:hpx=192.168.137.2:7910
<jbjnr> add --hpx:console to one, and hpx:worker to the olther
<jbjnr> (just to play safe)
<hkaiser> K-ballo: pls read teh SO post carefully
<jbjnr> he means Kiril_ ^
<K-ballo> ...
<hkaiser> K-ballo: sorry ;)
<hkaiser> the first command line misses --hpx:worker
<Kiril_> okay -- that worked
<Kiril_> thank you
<jbjnr> \o/
<Kiril_> can I ask one more question related to another example?
<jbjnr> no
<Kiril_> please?
<jbjnr> of course you can. That was a joke^
<Kiril_> so I was looking at the heartbeat
<jbjnr> (note that launching the jobs "by hand" now works, but we must determine whay the slurm launch failed)
<Kiril_> all I understand is that performance counters, in particular some kind of queue is used -- but I fail to see what this has to do with a heartbeat
<Kiril_> and I could not find any documentation of how these counters can be used for a heartbeat
<Kiril_> anyone has a few sentences for me explaining this?
<jbjnr> the heartbeat refers only to the ping of one node to another every second or so to see if it is still doing something
<jbjnr> Can't quite remember what the heartbeat example does. I have a modified version of it somehwere that needs cleaning up and contributing
<hkaiser> Kiril_: heartbeat demonstrates two things
<hkaiser> Kiril_: a) launch a new locality after startup and let it connect back to the main set of localities (in this case just locality 0)
<hkaiser> and b) query arbitrary perf counters, in this case from inside the newly attached locality
<Kiril_> I can't figure how locality 0 ( I assume this is the heartbeat_console process) communicates at all with joining localities
<hkaiser> Kiril_: it tries to create a perf counter
<Kiril_> uhm ... does the perf counter of locality 1 then reside on locality 0?
<hkaiser> no
<jbjnr> no, the perf counter of 1 lies on 1, but when 0 requests the counter for 1, it is doing a remote query
<Kiril_> ah, okay
<hkaiser> Kiril_: just looking at the heartbeat_console
<Kiril_> I had a hard time figuring from where to where data flows
<hkaiser> it's not doing any perf counter queries
<hkaiser> I misremembered
<jbjnr> oops. ignore me then
<jbjnr> I spent ages on that example as well and completely forgtten what it does. going for tea so I don't send any more false messages for a few minutes
<hkaiser> it's the attached heartbeat locality which is querying the perf counter
<Kiril_> :)
<Kiril_> but I assume the perf counter is then at locality 0?
<Kiril_> if the attached locality is reading its own counter, it would not be any heartbeat check
<hkaiser> Kiril_: whatever you specify on the command line, by default '/threadqueue{locality#0/total}/length', which is on locality 0, yes
<Kiril_> okay, that makes sense, I just wanted to make sure that at some stage one locality (e.g. 1) is querying someone across the wire (in this case, locality 0)
<Kiril_> which, I guess, is the heartbeat then
aserio has joined #ste||ar
<jbjnr> happy late birthday for yesterday aserio :)
<aserio> jbjnr: thanks!
Kiril_ has quit [Quit: Page closed]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
parsa has joined #ste||ar
parsa has quit [Client Quit]
Matombo has quit [Ping timeout: 260 seconds]
Matombo has joined #ste||ar
<hkaiser> ajaivgeorge: two of the new tests from your PR are failing
vamatya has joined #ste||ar
pree has joined #ste||ar
<github> [hpx] hkaiser pushed 1 new commit to resource_partitioner_jb: https://git.io/v7mNs
<github> hpx/resource_partitioner_jb dec1b36 Hartmut Kaiser: Unified threadmanager_base and threadmanager_impl
<jbjnr> hkaiser: great ^^^ removing some of those (now unnecessary) extra classes is on our todo list. was on the list I mean. :)
ajaivgeorge__ has joined #ste||ar
ajaivgeorge__ has left #ste||ar [#ste||ar]
pree has quit [Remote host closed the connection]
ajaivgeorge__ has joined #ste||ar
vamatya has quit [Ping timeout: 260 seconds]
<hkaiser> jbjnr: :D
<hkaiser> I'm glad you approve
<jbjnr> remove more!
<hkaiser> jbjnr: most of the work there was done already
pree has joined #ste||ar
<hkaiser> working on it
<hkaiser> perf counters first
eschnett has quit [Ping timeout: 260 seconds]
<jbjnr> shoshana ran out of time and we decided to concentrate on the main features we need rather than making it clean and nice
pree has quit [Read error: Connection reset by peer]
<hkaiser> jbjnr: sure
<jbjnr> clean and nce is your job ::) thanks!
eschnett has joined #ste||ar
<ajaivgeorge> hkaiser: I am looking at the failing tests. I will fix it in my branch. I had run the tests on rostam before submitting. Not sure why it is failing now.
<parsa[[w]]> hkaiser: #2720 worked
jgoncal has joined #ste||ar
ArashA has joined #ste||ar
<hkaiser> ajaivgeorge: sure, no worries
EverYoung has quit [Ping timeout: 246 seconds]
pree_ has joined #ste||ar
ArashA has quit [Quit: Textual IRC Client: www.textualapp.com]
EverYoung has joined #ste||ar
bikineev has joined #ste||ar
<ABresting> diehlpk_work: working on my PR and report, will do 2nd eval. form
bikineev has quit [Remote host closed the connection]
mars0000 has joined #ste||ar
mars0000 has quit [Client Quit]
bikineev has joined #ste||ar
patg[[w]] has joined #ste||ar
denis_blank has joined #ste||ar
pree_ has quit [Ping timeout: 240 seconds]
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
bikineev has quit [Remote host closed the connection]
pree_ has joined #ste||ar
aserio has quit [Ping timeout: 248 seconds]
david_pfander has quit [Ping timeout: 255 seconds]
taeguk has quit [Ping timeout: 260 seconds]
pree_ has quit [Ping timeout: 260 seconds]
aserio has joined #ste||ar
<hkaiser> jbjnr: yt?
pree_ has joined #ste||ar
<denis_blank> hkaiser: Did you receive my PM?
bikineev has joined #ste||ar
<hkaiser> denis_blank: yah, let's talk now
<denis_blank> hkaiser: ok
patg[[w]] has quit [Quit: Leaving]
Matombo has quit [Remote host closed the connection]
mars0000 has joined #ste||ar
parsa has joined #ste||ar
bikineev has quit [Remote host closed the connection]
Matombo has joined #ste||ar
jgoncal has quit [Ping timeout: 240 seconds]
aserio has quit [Ping timeout: 246 seconds]
bikineev has joined #ste||ar
hkaiser has quit [Quit: bye]
aserio has joined #ste||ar
Matombo has quit [Remote host closed the connection]
jgoncal has joined #ste||ar
vamatya has joined #ste||ar
hkaiser has joined #ste||ar
aserio has quit [Quit: aserio]
parsa has quit [Quit: Zzzzzzzzzzzz]
denis_blank has quit [Quit: denis_blank]
eschnett has quit [Quit: eschnett]
jgoncal has quit [Ping timeout: 246 seconds]
EverYoun_ has joined #ste||ar
jgoncal has joined #ste||ar
EverYoung has quit [Ping timeout: 246 seconds]
eschnett has joined #ste||ar
EverYoun_ has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
eschnett has quit [Quit: eschnett]
eschnett has joined #ste||ar
mars0000 has quit [Quit: mars0000]
diehlpk_work has quit [Quit: Leaving]
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
jgoncal has quit [Ping timeout: 240 seconds]
eschnett has quit [Quit: eschnett]
<K-ballo> so, what's the proper thing to link to in order to use std::atomic on linux? and get the entire support, not just the builtins
<zao> Is there a way to not get it?
<zao> Assuming you have a non-horrible stdlib?
<K-ballo> somehow... you get extra lockfree-ness by linking
<zao> *sigh*
<K-ballo> otherwise you get just the built ins + mutex based for the rest
<zao> I guess I shouldn't be surprised.
<K-ballo> or something like that... I haven't actually seen it, a.williams said something to that effect some time ago
<K-ballo> # Sometimes linking against libatomic is required for atomic ops, if
<K-ballo> # the platform doesn't support lock-free atomics.
<hkaiser> K-ballo: linking with latomic is probably not needed for the feature test, then, rather we should add it to the normal build, shouldn't we?
<K-ballo> hkaiser: I don't know about the feature test, but yeah we should add it to the normal build.. that's why I'm asking, because I'm not sure when/how
eschnett has joined #ste||ar
bikineev has quit [Remote host closed the connection]
hkaiser_ has joined #ste||ar
hkaiser has quit [Read error: Connection reset by peer]
eschnett has quit [Quit: eschnett]
jgoncal has joined #ste||ar
akheir has quit [Remote host closed the connection]
EverYoun_ has joined #ste||ar
EverYoun_ has quit [Ping timeout: 246 seconds]
EverYoung has quit [Ping timeout: 276 seconds]
pree_ has quit [Quit: AaBbCc]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar