aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
EverYoun_ has quit [Remote host closed the connection]
EverYoun_ has joined #ste||ar
EverYoun_ has quit [Remote host closed the connection]
vamatya has quit [Ping timeout: 246 seconds]
daissgr has joined #ste||ar
K-ballo has quit [Quit: K-ballo]
hkaiser has quit [Quit: bye]
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 276 seconds]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
vamatya has joined #ste||ar
hkaiser has joined #ste||ar
hkaiser has quit [Client Quit]
nanashi55 has quit [Ping timeout: 248 seconds]
nanashi55 has joined #ste||ar
daissgr has quit [Ping timeout: 276 seconds]
EverYoung has joined #ste||ar
vamatya has quit [Ping timeout: 264 seconds]
parsa has joined #ste||ar
parsa has quit [Quit: Zzzzzzzzzzzz]
jaafar has quit [Ping timeout: 246 seconds]
<github> [hpx] sithhell pushed 1 new commit to master: https://git.io/vNo21
<github> hpx/master a220e5b Thomas Heller: Merge pull request #3122 from STEllAR-GROUP/fixing_3121...
<github> [hpx] sithhell deleted fixing_3121 at 4c7c746: https://git.io/vNo2S
david_pfander has joined #ste||ar
<zao> This is nifty... seems like a flag we've built our OpenMPI with for a long while has some rather massive perf shenanigans.
Guest2733 is now known as Vir
<heller_> lol
<heller_> jbjnr_: yt?
<jbjnr_> here
<jbjnr_> no
<heller_> jbjnr_: I want to add a pycicle tester, how do I verify that my setup works? As in how would I start a test run of a build?
<jbjnr_> on local laptop/machine or over ssh
<jbjnr_> once you've setup a config file like "my-laptop.cmake" then "python ./pycicle -m my-laptop"
<jbjnr_> but I usually try -p 3118 and --debug
<jbjnr_> to force just one PR to be checked and also don't trigger any builds (over ssh to test job luanching etc)
<jbjnr_> --help gives a quick summary of options
<heller_> thanks
<jbjnr_> use -p option when running locally, because if N branches need rebuilding, you only have one machine, so triggering just one branch is a good idea
<jbjnr_> -f to force a rebuild even if it doesn't need it
<jbjnr_> python pycicle -f -p 3118
<jbjnr_> --debug (-d) prints out comand without ssh sending them
<github> [hpx] sithhell pushed 1 new commit to master: https://git.io/vNoSy
<github> hpx/master 261fe3a Thomas Heller: Merge pull request #3125 from STEllAR-GROUP/after_3120...
<jbjnr_> I might have broken local build recently, only been using it over ssh
<jbjnr_> if it doesn't work, tell me and I'll fix it
<heller_> i want to test a ssh build anyway
<jbjnr_> since it "works for me" but nobody else has tried it, there are probably several things broken that I jus assume people know even though they couldn't possibly know
<heller_> sure
<heller_> I think I got a hang on it now
<heller_> thanks
<jbjnr_> my daint setup is just "python ./pycicle.py -m daint"
<heller_> where would I set the build type?
<jbjnr_> and I leave it running in a terminal and can see stuff scroll whenever you do a merge (like just now)
<jbjnr_> then it settles into a quiet state and just prints out a check - time since last 86s
<jbjnr_> watch out that in the python there is a hardocded randon clang/gcc option you might want to disable
<jbjnr_> I need to turn those into proper config settings that can be changed on a per project/setup basis
<heller_> random clang/gcc?
<heller_> that sounds odd
<heller_> it's just "if machine==daint"
<heller_> ;)
<jbjnr_> oh yes. I forogt there was a dint check there too
<jbjnr_> ^daint
<jbjnr_> (what we need is an options matrix with N to choose from and then have randomly generated configs. Just gcc/clang so far)
<jbjnr_> anyway ...
<jbjnr_> boost = 'x.xx.x' :)
<heller_> why random? Don't we want to check all options in the matrix each time?
<jbjnr_> well, for one, you complained that there was too much information
<jbjnr_> so multiple all builds by all options
<jbjnr_> gtg bbiab
<heller_> well sure
K-ballo has joined #ste||ar
<heller_> jbjnr_: Failed to connect to github. Network down?
<heller_> hmm, bad credentials ... but I generated my user token?
<heller_> jbjnr_: you got your first user!
<jbjnr_> working?
<simbergm> does anyone know what's up with rostam?
<simbergm> jbjnr_: I think you got a question on github from someone who wants to use pycicle
<heller_> jbjnr_: slowly...
<jbjnr_> simbergm: ooh. looking now
<heller_> simbergm: dunno ... some system messup
hkaiser has joined #ste||ar
<heller_> hkaiser: rostam seems to be completely borked now :/
<hkaiser> heller_: I have not seen Al yesterday
<heller_> hkaiser: shall I have a look?
<hkaiser> pls do, but leave Al in the loop, pls
<heller_> ok, i undrained the ariel nodes
<heller_> this should get rid of the exception
<github> [hpx] sithhell deleted fix_hello at baacf09: https://git.io/vNoxs
<heller_> merged a new PR ... let's see
* jbjnr_ is waiting with baited breath
<heller_> so ... just using, for example papi_cost, I can't reproduce the segfault
<heller_> so it is either the generic context coroutines (who knows why those are activated on rostam) or something else
<heller_> I'll try to reproduce this locally here
<hkaiser> heller_: how do you know those are enabled?
<heller_> hkaiser: from the stacktrace
<heller_> hkaiser: red herring ... they are only enabled for the gcc 6.3 builds
<hkaiser> I now remember we enabled those deliberately to have it tested on at least one or two platforms
<heller_> *nod*
<K-ballo> good thinking
<jbjnr_> We will have to start running HPX on summit (ARM)
<hkaiser> indeed
<jbjnr_> so expect plenty of testing of differnt architectures soon
<hkaiser> perfect
<jbjnr_> ARM + 2 GPU's per node
<heller_> arm?
<heller_> I thought it was power9?
<hkaiser> isn't summit power9?
<heller_> I have a regular power tester
<jbjnr_> so sorry. power 9
<jbjnr_> I'm losing my mind
<jbjnr_> risc-ish anyway :)
<jbjnr_> the hpx+cuda situation is a total mess, and this is going to be a problem for us moving forward with our next big project on summit.
<jbjnr_> ooh - cdash has a FAU machine incoming
<heller_> hkaiser: regarding our discussion about two threads trying to resume one suspended one ... and I am having hard time to figure out what the semantics of this should be ... I am even thinking that this is UB in general, and the two "resumer" threads need to synchronize in a different way, for example what our condition variable is doing. Am I totally off there?
<heller_> jbjnr_: I agree ... there would have been an easy way out of this mess...
<jbjnr_> well. I blame you for that!
<hkaiser> heller_: stop worrying about this ;-)
<hkaiser> just leave it alone
<hkaiser> jbjnr_: having a mess is only a 'problem' if nobody does anything about it
parsa has joined #ste||ar
<jbjnr_> it'll be my next problem, so I'll have to do something about it
<hkaiser> good!
<hkaiser> should we clean up the existing half-way solved 'problems' first?
<jbjnr_> cat hpx::compute > /dev/nul
<jbjnr_> which "we" are we talking about?
<hkaiser> if you have a better solution, all the better
<hkaiser> I always use the the royal 'we' ;) I'm a Kaiser
<heller_> lol
<jbjnr_> <sigh>
Smasher has quit [Remote host closed the connection]
Smasher has joined #ste||ar
mcopik has quit [Ping timeout: 255 seconds]
<heller_> hkaiser: the suspend/resume or the other thing?
<hkaiser> what other thing?
<heller_> the easy way out of the hpx+cuda mess ;)
<hkaiser> ahh
<hkaiser> the suspend/resume
<heller_> I won't, that really bugs me ;)
<hkaiser> heh, why did I know ...
<hkaiser> it's very high risk with questionaly outcomes
<hkaiser> questionable*
<hkaiser> the changes you work on currently are noticable in real applications for very high contention already - this change would not be measuarable at all - so why bother
<hkaiser> ?
<heller_> I am getting those changes in first
<heller_> and I don't think they won't be measurable
<hkaiser> the current changes will have some effect for sure
<heller_> absolutely
<jbjnr_> still trying to fix my bugs, so I can test your stuff heller_
<heller_> jbjnr_: gradually getting it into master
<jbjnr_> :)
<heller_> jbjnr_: once I am completely done, we'll take care of your stuff
<heller_> just takes an insane amount time, since I really want to ensure everything still works
<heller_> hkaiser: FWIW, there might be a low hanging fruit that doesn't require a high risk change
<heller_> but I won't tell any details until I have more than just an educated guess, because I know what you'll going to say
<simbergm> currently hpx::start can return before the runtime is in state_running, is this a feature or bug? I'd like to suspend the runtime as soon possible after it's running
<simbergm> and second, starting the runtime without a main function does not seem to be possible right now, or am I missing some overload?
<hkaiser> heller_: ok
<hkaiser> simbergm: I was not aware of start returning before the runtime actually 'runs'
<hkaiser> but start is ment to signal to the runtime to start - so I guess it could happen, yah
<simbergm> yeah, I don't think it's necessarily wrong, just wondering if I should do the checking separately in that case
<hkaiser> simbergm: start(argc, argv) does not need a main-function?
<simbergm> hkaiser: but that will then call hpx_main?
<hkaiser> only on locality 0 (if not otherwise configured)
<hkaiser> and only if the locality is not connecting late
<simbergm> okay, I'm only dealing with one locality
<simbergm> I'm just trying to streamline starting the runtime and getting it suspended
<hkaiser> what would be the point of starting the runtime without running any functions?
<simbergm> it's not a big deal to add an empty hpx_main though
<hkaiser> nod
<simbergm> it should just be initialized, I can then later resume; hpx::async(); suspend
<simbergm> ideally faster than start/stop
<hkaiser> makes sense
<hkaiser> that was not part of the initial design ;)
<hkaiser> I think passing nullptr as hpx_main will do the trick
<hkaiser> or a default constructed function<...>{}
<simbergm> mmh, I'll try that
<simbergm> an empty function_nonser did not work when I tried it though
<hkaiser> hold on
<K-ballo> an empty one might throw, unless it is special cased to be ignored
<simbergm> I think it gets wrapped along the way
<simbergm> yeah, it throws
<simbergm> hkaiser: my guess is that the bind in hpx_init.cpp wraps it so that it actually tries to call it
<simbergm> but I'm not certain
<hkaiser> I though func was the function_nonser passed in from the user verbatim - but I might be wrong
parsa has quit [Quit: Zzzzzzzzzzzz]
<hkaiser> simbergm: feel free to add an overload for init()/start() taking a nullptr_t instead of the function, causing nothing to be run
<simbergm> hkaiser: yeah, I might be wrong too
<simbergm> ok, I might just do that
<hkaiser> simbergm: and you're right, func is bound separately for certain cases: https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/hpx_init_impl.hpp#L288-L289
<simbergm> hkaiser: thanks
<simbergm> that's the one I meant
<simbergm> I'll add some overloads
<hkaiser> the nullptr_t overloads should take care of that by constructing an empty function_nonser
<K-ballo> so many overloads
<simbergm> nullptr_t seems nice, never used it
<hkaiser> K-ballo: so much fun
hkaiser has quit [Quit: bye]
akheir has joined #ste||ar
aserio has joined #ste||ar
<akheir> heller_: yt? I heard you ran into trouble with rostam yesterday
<heller_> akheir: buildbot does
<heller_> akheir: look at the segfaults
<heller_> akheir: something to do with papi
<heller_> here is the stacktrace
<heller_> I can't reproduce it anywhere else and it is not a stack overflow
<akheir> heller_: I'll take a look, I patched the server for meltdown thing last week, I've heard It may cause trouble with papi but I didn't expect this
<heller_> you don't trust your users :)?
<heller_> when I run some papi utilities, it seems to work fine
<heller_> it's just within HPX
<heller_> maybe you need to recompile PAPI against the new kernel?
<akheir> heller_: this papi comes with upstream I will compile a new one and put it in buildbot
<akheir> heller_: I don't know half of the users ;-)
<heller_> thanks
<heller_> let's hope this'll fix it
<heller_> akheir: another note, I undrained the ariel nodes this morning
<akheir> heller_: thanks, Patrick told me about it but I wasn't on my desk anymore to fix it
rtohid has joined #ste||ar
daissgr has joined #ste||ar
daissgr has quit [Ping timeout: 276 seconds]
daissgr has joined #ste||ar
<jbjnr_> heller_: FAU build didn't ever complete by the looks of it :(
<heller_> jbjnr_: no i cancelled it
<jbjnr_> ok
<heller_> because idiot me included the header tests
<jbjnr_> Still can't remember what they are/do
<heller_> they check that all headers are self consistent
<heller_> self contained, sorry
EverYoung has quit [Ping timeout: 276 seconds]
<github> [hpx] hkaiser closed pull request #3118: Adding performance_counter::reinit to allow for dynamically changing counter sets (master...reinit_counters) https://git.io/vN2Is
aserio has quit [Ping timeout: 276 seconds]
eschnett has joined #ste||ar
aserio has joined #ste||ar
<aserio> wash[m]: Will you be joining us today?
parsa has joined #ste||ar
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
akheir_ has joined #ste||ar
vamatya has joined #ste||ar
akheir_ has quit [Remote host closed the connection]
parsa has quit [Quit: Zzzzzzzzzzzz]
aserio has quit [Quit: aserio]
aserio has joined #ste||ar
aserio has quit [Ping timeout: 276 seconds]
jaafar has joined #ste||ar
EverYoun_ has joined #ste||ar
EverYoung has quit [Ping timeout: 276 seconds]
daissgr has quit [Read error: Connection reset by peer]
david_pfander has quit [Ping timeout: 240 seconds]
jaafar_ has joined #ste||ar
jaafar has quit [Ping timeout: 252 seconds]
jaafar_ is now known as jaafar
jaafar has quit [Remote host closed the connection]
jaafar_ has joined #ste||ar
jaafar_ is now known as jaafar
aserio has joined #ste||ar
jaafar has quit [Remote host closed the connection]
jaafar has joined #ste||ar
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 252 seconds]
aserio1 is now known as aserio
daissgr has joined #ste||ar
aserio has quit [Ping timeout: 252 seconds]
aserio has joined #ste||ar
jaafar has quit [Remote host closed the connection]
jaafar has joined #ste||ar
jaafar has quit [Ping timeout: 252 seconds]
jaafar has joined #ste||ar
<aserio> twwright: ty?
<twwright> aserio, yes
<aserio> twwright: see pm please
aserio has quit [Quit: aserio]
aserio has joined #ste||ar
aserio1 has joined #ste||ar
jaafar_ has joined #ste||ar
jaafar has quit [Ping timeout: 276 seconds]
aserio1 has quit [Remote host closed the connection]
daissgr has quit [Quit: WeeChat 1.4]
aserio has quit [Ping timeout: 252 seconds]
daissgr has joined #ste||ar
eschnett has quit [Quit: eschnett]
aserio has joined #ste||ar
hkaiser has joined #ste||ar
<jbjnr_> heller_: good news, bad news - good news - I fixed my problem and can run tests again - bad news, no noticable speedup using your fix overheads branch (yet)
kisaacs has joined #ste||ar
<hkaiser> jbjnr_: you most likely won't be able to see a measurable improvement in overall runtime, but you should be able to reduce your thread-granularity which then might improve runtime
<aserio> hkaiser: would you forward me Jiangua's email
<hkaiser> done
<jbjnr_> that's exactly what I'm testing. no noticable speedup for the smaller block sizes
<hkaiser> jbjnr_: ok, good to know
aserio has quit [Ping timeout: 252 seconds]
Smasher has quit [Remote host closed the connection]
Smasher has joined #ste||ar
akheir has quit [Remote host closed the connection]
rtohid has left #ste||ar [#ste||ar]
mcopik has joined #ste||ar
kisaacs has quit [Ping timeout: 260 seconds]
aserio has joined #ste||ar
aserio has quit [Client Quit]
jaafar_ is now known as jaafar
kisaacs has joined #ste||ar
EverYoung has joined #ste||ar