aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
<diehlpk> heller, Did you get some response from the OpenSuCo paper?
diehlpk has quit [Remote host closed the connection]
EverYoung has quit [Ping timeout: 246 seconds]
hkaiser has quit [Quit: bye]
K-ballo has quit [Quit: K-ballo]
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 246 seconds]
diehlpk has joined #ste||ar
jaafar has quit [Ping timeout: 258 seconds]
prashj has joined #ste||ar
prashantj has joined #ste||ar
prashantj16 has joined #ste||ar
prashj has quit [Ping timeout: 258 seconds]
prashantj has quit [Ping timeout: 258 seconds]
diehlpk has quit [Ping timeout: 240 seconds]
Smasher has quit [Ping timeout: 240 seconds]
Smasher has joined #ste||ar
prashantj16 has quit [Quit: Leaving]
<github> [hpx] sithhell pushed 2 new commits to master: https://git.io/vda1z
<github> hpx/master 97a5260 Mikael Simberg: Fix some typos in documentation
<github> hpx/master 5acf3d8 Thomas Heller: Merge pull request #2937 from msimberg/docs-typo-fix...
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 246 seconds]
jbjnr_ has joined #ste||ar
jbjnr has quit [Ping timeout: 246 seconds]
jbjnr_ is now known as jbjnr
jbjnr has quit [Quit: ChatZilla 0.9.93 [Firefox 56.0/20170926190823]]
quaz0r has quit [Ping timeout: 258 seconds]
jbjnr has joined #ste||ar
david_pfander has joined #ste||ar
quaz0r has joined #ste||ar
jbjnr_ has joined #ste||ar
EverYoung has joined #ste||ar
jbjnr has quit [Ping timeout: 255 seconds]
jbjnr_ is now known as jbjnr
EverYoung has quit [Ping timeout: 258 seconds]
<zao> heller: Did you meet Stefano Markidis perhaps at KTH?
<zao> He's presenting at the "application expert" meeting I'm at now about exascale stuff.
<heller> zao: I didn't, he is the ipic3d guy, right?
<zao> Seems so.
EverYoung has joined #ste||ar
jbjnr has quit [Quit: ChatZilla 0.9.93 [Firefox 56.0/20170926190823]]
EverYoung has quit [Ping timeout: 246 seconds]
jbjnr has joined #ste||ar
<zao> "mpi is going to be exascale"
<zao> :D
pree has joined #ste||ar
K-ballo has joined #ste||ar
hkaiser has joined #ste||ar
<github> [hpx] hkaiser pushed 1 new commit to master: https://git.io/vdVWX
<github> hpx/master d0fb5d3 Hartmut Kaiser: Merge pull request #2939 from STEllAR-GROUP/reporting_set_affinity_problems...
pree has quit [Read error: Connection reset by peer]
<jbjnr> hkaiser: heller - others if interested. Please feel free to edit this doc and add more stuff to it, or remove bits https://docs.google.com/document/d/1XtXUM6fhxZVn898_zgAHeWh_44zHJBxGmWK07Tkp0iI/edit?usp=sharing
<jbjnr> I will start chasing people up at CSCS to find out about feasibility etc of using our Jenkins for the above
<heller> jbjnr: sounds good!
<jbjnr> pleas add more things. I must have forgotten stuff.
<heller> do we want to test the CUDA stuff?
<heller> intel compiler?
<jbjnr> also comoilers, boost versions etc. Should MSVC also be an "essential' ?
<jbjnr> intel. good point
<heller> what about the millions of cmake options with different dependencies?
<jbjnr> indeed. What do we do. How many builds can we realistically imagine?
<hkaiser> could we cycle through a set of options?
<jbjnr> good idea
<heller> test one set each night?
<hkaiser> yah
<jbjnr> or perhaps have N cmake setting randomly combined on each build!
<heller> na, it has to be reprodicable
<jbjnr> ok
<jbjnr> anyway, add stuff to the doc
<heller> also problematic for a PR
<hkaiser> well, as long as we can fugure out what was used random is fine
<heller> it might not test the stuff that has been proposed
<jbjnr> hkaiser: you missed the discussion, but we spoke to my boss and agreed that if CSCS can provide it, then I can approach them with potential requirements and see if they can take over the CI
<hkaiser> cool!
<jbjnr> they might not be able to actually deliver, but we'll see
<jbjnr> hkaiser: gsoc next year I'd like to mentor
<hkaiser> jbjnr: we shouldn't move all of it over to cscs
<hkaiser> jbjnr: go for it!
<jbjnr> I mean blaze+hpx. I suggested it as an internship at CSCS but it was rejected
<hkaiser> ahh, cool! - this is just a first quick&dirty implementation - can be improved much
pree has joined #ste||ar
<jbjnr> hkaiser: the idea is to replace buildbot - but if you want to keep it around then that's fine, no reason why we can't have multiple.
<hkaiser> well, I would like to removing one bottleneck while creating a new one
<jbjnr> hkaiser: I didn't realize you actualy started on blaze. I just saw the issus didn't look at the contents.
<hkaiser> no reason not to include our resources into the build system
<jbjnr> buildbot would become a jbenkins slave if this went ahead
<hkaiser> jbjnr: yah, I did a first implementation
<jbjnr> people here said blaze was shite
<hkaiser> jbjnr: I don't care how its called
<hkaiser> jenkins or buildbot...
<heller> hkaiser: YES!
<hkaiser> jbjnr: we'll see, this looks good: https://bitbucket.org/blaze-lib/blaze/wiki/Benchmarks
<heller> jbjnr: so why is blaze shit?
<hkaiser> main benefits: a) we know Klaus, b) we can easily create an HPX backend
<jbjnr> the benchmarks are fine, but the problem is that the blas level 1,2 are too easy and without the eigensolvers and stuff it's useless. I like the api and would use it anyway, but the real linear algebra people are doing this kind of stuff. http://www.icl.utk.edu/files/publications/2017/icl-utk-980-2017.pdf
<jbjnr> this is where (one day) our hpx backend for linear algebra could go.
<hkaiser> ok
<hkaiser> welll, nothing goes without involving Dongarra in the field of LA
pree has quit [Read error: Connection reset by peer]
<hkaiser> at least distributed LA
<jbjnr> if I have a task on a queue and I want to get the arguments as a tuple, is it possible?
<jbjnr> (inside the scheduler). there must be some task structure that holds them
<hkaiser> jbjnr: that's not possible
<jbjnr> how does invoke get them?
<hkaiser> the arguments are bound to the function using util::bind()
<K-ballo> deferred_call ?
<hkaiser> even more, the actual hpx-thread has a fixed API
<jbjnr> can it be rebound to another function somehow
<K-ballo> (we can't use bind with user defined arguments)
<hkaiser> K-ballo: sure
<hkaiser> jbjnr: sec
<jbjnr> I'm working on numa placement and I' have many many tasks, each is an operation on some matrix tiles, inside the scheduler, I have a way of quwerying a memory address and getting the numa placement. I would like to pull the arguments to the task and forward them to a helper function that does the numa check, before the task is added to the queue
<jbjnr> so I can move it to the queue on the right numa domain
<jbjnr> I have a teomplated guided_pool_executor that holds the helper function
<jbjnr> this is all, good, but I do not know how to call my helper with my task stuff
<jbjnr> (imagein calling the task twice, but once into my helper, the second time for the real task)
<jbjnr> The helper takes the same args as the task/ontinuation/etc
<hkaiser> wrap your function into a helper function which is doing things?
<hkaiser> before actually being invoked
<jbjnr> then it will only be called after it is in the scheduler and executed - then it might be on the wrong numa donain - need to be done earlier
<jbjnr> before added to queues
<hkaiser> jbjnr: all the scheduler sees is one of those two functions: https://github.com/STEllAR-GROUP/hpx/blob/master/src/runtime/applier/applier.cpp#L35-L61
<jbjnr> hkaiser: can I skype you sometime soon?
<jbjnr> (like now?)
<jbjnr> :)
<hkaiser> can it wait for an hour or so?
<jbjnr> yup
<jbjnr> np
<hkaiser> I'm about to leave from home
<jbjnr> ok, when you have time, ping me. thanks
<hkaiser> I'll ping you
<heller> jbjnr: why don't you intercept the arguments in the executor and then dispatch accordingliy?
<heller> OMPI-X <-- nice
<jbjnr> heller: I do not know where you mean
<jbjnr> task_future.then(fancy_executor, [](arg1, arg2){a new task;});
<heller> yes
<heller> for example
<heller> here you can introspect the arguments and do the right(tm) thing, post it on a specific queue, a specific executor, whatever
<jbjnr> the probelm is that that will be called before the arguments might be ready
<heller> in that case, there isn't a lot you can do
<heller> since you can't just introspect the value of the future
<jbjnr> then it will have to be a double task- first an unwrap and then a second actual execute, with the introspection going on after the unwrap.
<jbjnr> nasty stuff.
<jbjnr> and using a special queue
aserio has joined #ste||ar
<jbjnr> so that the decision is delayed about which core to go on
<heller> yeah, i currently see no other way around that
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
hkaiser has quit [Quit: bye]
<github> [hpx] Naios opened pull request #2941: Add a feature test to test for CXX11 override (master...feature_override) https://git.io/vdVwN
<zao> Two new failures in 341 runs:
<zao> 1 73 - tests.regressions.lcos_dir.sliding_semaphore_2338 (Failed)
<zao> 1 316 - tests.unit.parallel.partition (Failed)
<zao> Same build as the other day.
<zao> Test outputs.
<zao> I like that the partition test outputs a seed :D
<zao> heller: while true; do the gift that keeps on giving; done
<heller> Yeah!
<zao> Curse you people, laid awake all night pondering how to make a service out of it :)
<zao> (not the reason to not being able to sleep, but yay, something to mull about)
<heller> A pity that the partition test didn't tell you what failes
<zao> At least it spat out a seed and it's single-process?
<zao> Might be able to focus test that.
<zao> (not anytime soon tho, still in Stockholm)
<heller> Say hi to Erwin and Roman
<zao> Might've seen Roman in the break room this morning.
<zao> Not quite sure which buildings I'm in :)
<heller> ;)
<heller> Roman is tall with messy hair, Erwin is the boss, I'm sure he lingers around ;)
<zao> Erwin I've heard of, in the political layers.
<heller> Yes
<heller> He's the head of PDC, not sure about SNIC
hkaiser has joined #ste||ar
<hkaiser> jbjnr: I'd be able to skype now
eschnett has quit [Quit: eschnett]
<jbjnr> ok, I'm online
diehlpk has joined #ste||ar
<zao> Hrm, do I need some ancillary GDB magic to debug HPX properly, or is it magic?
<zao> GDB output for that partition test, seems to be spinning like crazy inside of the MT RNG.
<zao> (at least according to RWDI backtrace)
<zao> Call stack is 21693 deep.
<heller> So it's a problem with the rng?
<zao> That's the low end of the trace, inbetween are 20k spins in MT, it seems.
<heller> Insane
<zao> Didn't look deeper, figured I'd see if I could run the test while my peers present :)
<zao> Had to hunt around for --hpx:hpx to rebind the port, as I had test suites running in the background.
<zao> It also explodes with --hpx:threads 1, if that's a thing that varies.
<zao> Documentation claims it's 1 by default, but I got fewer threads this time around.
<zao> In anyway, insights, yay!
eschnett has joined #ste||ar
<aserio> hkaiser: the PXFS Meeting is happening now
aserio has quit [Read error: Connection reset by peer]
aserio has joined #ste||ar
<zao> 17:12:50 Dcoder | (policy=..., pred=..., rand_base=2147424774) from your error stack
<zao> 17:13:04 Dcoder | that rand_base value should be impossible, unless stack fuckery
<zao> Either that or optimized build, I guess.
<zao> Smells like scribbling tho.
<zao> heller: We get a rand_base=2147424774 on the second section of the test, which is the one that blows up.
<zao> Something smaller like 46006519 works.
<heller> Sounds like a stack overflow then
<zao> 17:19:59 Dcoder | random_fill(int rand_base, int half_range /* >= 0 */)
<zao> 17:19:59 Dcoder | : gen(std::rand()),
<zao> 17:19:59 Dcoder | dist(rand_base - half_range, rand_base + half_range)
<zao> 17:19:59 Dcoder | {}
<zao> 17:20:16 Dcoder | since rand_base is so close to INT_MAX, you get an overflow there
<zao> 17:20:22 Dcoder | on rand_base + half_range
<zao> heller: ^
<heller> Interesting
hkaiser has quit [Read error: Connection reset by peer]
<heller> Now, we still don't know where this messed up value comes from
<zao> std::rand() can trivially return that value, surely?
<zao> stdlib.h:#define RAND_MAX 2147483647
<heller> Most likely
<heller> So we need to trim the seed
<heller> On all tests that use such a thing
diehlpk has quit [Remote host closed the connection]
<zao> `The behavior is undefined if a>b.` btw, from uniform_int_distribution.
<zao> The core lesson here, always emit the seed in random-based tests \o/
<zao> A cookie to whoever did that.
<K-ballo> are we using `std::rand()` ?
<zao> Oh yes.
<zao> int rand_base = std::rand() on line 369 of a file.
<zao> Going to be nice and shallow on MSVC too.
diehlpk has joined #ste||ar
<K-ballo> how did that happen? :(
<zao> git blame sez "taeguk" :P
<zao> We need blame for reviewers :)
<zao> Or our favourite solution, more inspect!
diehlpk has quit [Remote host closed the connection]
<K-ballo> I was thinking more inspect, yes
<K-ballo> between `std::rand` now and those 98 STL binders some weeks ago...
<K-ballo> I mean, we shouldn't even have to have checks for those, but... happens
<zao> [zao@mim]: ~/stellar/hpx>$ git grep std::rand\( | wc -l
<zao> 887
<zao> Mostly tests, but also boot barrier, jenkins hash, and examples.
<K-ballo> replace std::rand() with 7 ?
diehlpk has joined #ste||ar
hkaiser has joined #ste||ar
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
jaafar has joined #ste||ar
aserio has quit [Read error: Connection reset by peer]
aserio has joined #ste||ar
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 260 seconds]
aserio1 is now known as aserio
<github> [hpx] hkaiser created fixing_2940 (+1 new commit): https://git.io/vdwf7
<github> hpx/fixing_2940 738dec5 Hartmut Kaiser: Adding split_future for std::vector...
<github> [hpx] hkaiser opened pull request #2942: Adding split_future for std::vector (master...fixing_2940) https://git.io/vdwfj
diehlpk has quit [Ping timeout: 258 seconds]
hkaiser has quit [Ping timeout: 248 seconds]
aserio has quit [Ping timeout: 240 seconds]
<heller> sketching a buildbot replacement: http://62be4d6b.ngrok.io
<zao> faster at least ;)
hkaiser has joined #ste||ar
<heller> zao: and totally unpopulated :p
pagrubel has joined #ste||ar
<heller> The configuration of the the workers, builds and repositories is all through human readable text files (json and cmake), automatically updated by github commits
<heller> The builds on the workers will be started via ssh
<heller> I think I can finish a first version by the end of the week
EverYoung has quit [Ping timeout: 255 seconds]
<K-ballo> where are the builds?
<heller> K-ballo: not implemented yet
pagrubel has quit [Remote host closed the connection]
patg[w]_ has joined #ste||ar
EverYoung has joined #ste||ar
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
diehlpk has joined #ste||ar
aserio has joined #ste||ar
eschnett has quit [Quit: eschnett]
hkaiser has quit [Read error: Connection reset by peer]
eschnett has joined #ste||ar
diehlpk has quit [Remote host closed the connection]
diehlpk has joined #ste||ar
hkaiser has joined #ste||ar
jbjnr has quit [Quit: ChatZilla 0.9.93 [Firefox 56.0/20170926190823]]
jbjnr has joined #ste||ar
<jbjnr> hkaiser: one place that I would probably want to insert the dataflow would be in the custom executor around the add function area - like here in the default executor https://github.com/STEllAR-GROUP/hpx/blob/050d1f17b6b7c77d9ca8fe399ca7e3ddea5ef6f8/src/runtime/threads/executors/default_executor.cpp#L45
<jbjnr> however ...
<jbjnr> this is called from code that is going on during the scheduling of a continuation etc etc - would adding a dataflow in here not cause problems?
<jbjnr> actually - I couldn't put it there because the args are hidden inside the closuer
<jbjnr> ^closure
<hkaiser> jbjnr: right, create a wrapping executor which dispatches to the pool executor
<jbjnr> incidentally hkaiser if the API changing from future to non unwrapped future was a real problem - then inside the forwarding executor after the dataflow - once could re-wrap them into make_ready futures. (just btw)
<hkaiser> jaafar: yah, I though about that - and we can do it more efficiently, if needed
<jbjnr> who is this jarjar binks character anyway ...
pree has joined #ste||ar
patg[w]_ has quit [Ping timeout: 255 seconds]
pagrubel has joined #ste||ar
<hkaiser> darn autocomplete
pree has quit [Ping timeout: 248 seconds]
hkaiser has quit [Quit: bye]
eschnett has quit [Quit: eschnett]
jbjnr_ has joined #ste||ar
jbjnr has quit [Ping timeout: 246 seconds]
jbjnr_ is now known as jbjnr
eschnett has joined #ste||ar
eschnett has quit [Quit: eschnett]
eschnett has joined #ste||ar
eschnett has quit [Ping timeout: 240 seconds]
eschnett has joined #ste||ar
jakemp has joined #ste||ar
eschnett has quit [Quit: eschnett]
pagrubel has quit [Ping timeout: 240 seconds]
hkaiser has joined #ste||ar
<aserio> hkaiser: yt?
<hkaiser> here
<hkaiser> aserio: ^^
diehlpk has quit [Remote host closed the connection]
<aserio> What objects can capture a nil?
<aserio> hkaiser: I am seeing this-> what(): primitive_result_type does not hold a literal value type: HPX(bad_parameter)
diehlpk has joined #ste||ar
<hkaiser> aserio: right - something is trying to interpret a primitive_result_type which is empty (nil)
<aserio> hkaiser: as it should
<hkaiser> use phylanx::execution_tree::is_valid to check instead
<aserio> I have made the condition false
<hkaiser> nod, right
<aserio> I am just confused about how this should be handled
<hkaiser> use phylanx::execution_tree::is_valid to check instead
<hkaiser> if this returns false you're golden
<diehlpk> hkaiser, Michael was here today and he will visit you tomorrow
diehlpk has quit [Remote host closed the connection]
<hkaiser> diehlpk: cool
<hkaiser> diehlpk, we need to talk about dinner tonight
<hkaiser> tomorrow night, that is
<hkaiser> aserio: let me add that missing overload to master
diehlpk has joined #ste||ar
pagrubel has joined #ste||ar
<aserio> hkaiser: so how do I add this in the execution tree?
<aserio> lets take this case
<aserio> c=a+b
<aserio> if(c!=0)
<aserio> {d/c}
<aserio> c++
<hkaiser> aserio: so I was wrong, the 'missing' overload is there, no need for me to add anything
diehlpk_ has joined #ste||ar
diehlpk has quit [Ping timeout: 258 seconds]
diehlpk_ has quit [Remote host closed the connection]
diehlpk_ has joined #ste||ar
diehlpk_ has quit [Remote host closed the connection]
diehlpk_ has joined #ste||ar
diehlpk__ has joined #ste||ar
aserio has quit [Quit: aserio]
diehlpk_ has quit [Ping timeout: 248 seconds]
wash has joined #ste||ar
<wash> hkaiser: ping. I seem to remember either an HPX or a boost document from a few years back, describing what a minimal test case is, and what info people should include in a bug report
<wash> does that ring any bells?
hkaiser has quit [Quit: bye]
hkaiser has joined #ste||ar