hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC2018: https://wp.me/p4pxJf-k1
diehlpk has joined #ste||ar
diehlpk has quit [Read error: Connection reset by peer]
Vir has quit [Ping timeout: 265 seconds]
Vir has joined #ste||ar
diehlpk has joined #ste||ar
Vir has quit [Ping timeout: 240 seconds]
Vir has joined #ste||ar
mcopik has quit [Ping timeout: 268 seconds]
diehlpk has quit [Remote host closed the connection]
K-ballo has quit [Quit: K-ballo]
hkaiser has quit [Quit: bye]
<jbjnr> is anyone else back at work and raring to go?
<zao> I'm on vacation and being about as unproductive as usual :D
<jbjnr> vacation - anywhere nice?
<jbjnr> I believe St. Petersburg could be nice for you this week :)
<zao> Nah, just staycation.
<zao> Went back home around midsummer, probably going again in a few weeks.
<zao> (split my vacation days up in two sections this year, so two weeks now, then three weeks later)
jbjnr has quit [Ping timeout: 245 seconds]
jbjnr has joined #ste||ar
jaafar has quit [Ping timeout: 268 seconds]
<jbjnr> grrrr. my windows machine is just terrible these days. blue screen of death and reboots all the time.
<zao> How bothersome.
<zao> Were you running pycicle infra on it?
<jbjnr> yes
<zao> I was building one of the SoC PRs the other day, my CI stuff apparently still has trouble with a test :(
<jbjnr> not sure what 'infra' means. but I have two pycicle instances spawngin work on the cray. they are just python loops polling github. Not building here.
<zao> I wonder if the container setup interferes with it.
<zao> Infrastructure.
<zao> Anyway, welcome back :D
<heller> jbjnr: welcome back!
jbjnr has quit [Ping timeout: 240 seconds]
jbjnr has joined #ste||ar
<jbjnr> rebooted again!
<jbjnr> heller: hi. Have you finished the kokkos intregration yet! :)
<heller> jbjnr: no, the kokkos and HPX model are interestingly very different and not really compatible
<heller> what i'd like instead is a thorough comparison between the two
<jbjnr> I believe we must make our stuff compatible if we are to get peak performance on a node
<heller> that's what I'm still wondering
<heller> Kokkos doesn't come for free either
<jbjnr> I've already made a lot of progress with my abaility to provide hints to the scheduler about where to put tasks, but we need to go much further.
<heller> we get nice performance for the stream benchmark, for example. Something the kokkos model is supposed to be perfect for, for example
<heller> right
<heller> I am not arguing that what we have is perfect
<jbjnr> the stream benchmark is not really a good example though as it does not use the 'standard' api that the rest of hpx uses
<jbjnr> did you reach any conclusions about N-ary tasks?
<heller> I am not even sure what that standard API is ...
<heller> N-ary tasks: that's just a by product of their model. I don't think it buys us anything
<heller> but yes, the stream benchmark needs to be streamlined again
<heller> the point is: It is able to deliver
<jbjnr> N-ary : I like the idea of creating 1 task instead of 32 (or some other number) and decrementing the ranges used.
<heller> yes, I guess that's one point where we need to optimize
<heller> instead of calucating the partitions upfront, each thread should do it on its own based on some index
<heller> or s
<heller> o
<heller> also: you are saying that we aren't able to reach peak on a single node with what we have today. On what ground are you making that statement? Do you have a comparison of your cholesky stuff using Kokkos?
<heller> and more importantly: I must get out of this overwhelming, productivity killing thesis swamp that keeps on draining my energy for too long
david_pfander1 has joined #ste||ar
<github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/f1Wgg
<github> hpx/gh-pages 63eed00 StellarBot: Updating docs
david_pfander1 has quit [Ping timeout: 245 seconds]
hkaiser has joined #ste||ar
jakub_golinowski has joined #ste||ar
jakub_golinowski has quit [Quit: Ex-Chat]
jakub_golinowski has joined #ste||ar
K-ballo has joined #ste||ar
mcopik has joined #ste||ar
nikunj has joined #ste||ar
<nikunj> hkaiser: yt?
<hkaiser> here
<nikunj> So I just tried integrating my apple implementation into HPX. Things are working fine as of now (examples are running well). I'm onto running tests now
<hkaiser> nice!
<hkaiser> good job!
<nikunj> Could you please review my pr so that I can add another pr for apple integration as well (it adds onto hpx_wrap.cpp and I do not want to combine Linux and Mac OS integration into same pr)
<hkaiser> nikunj: will try to get to it today
<nikunj> thanks, I'll add another pr as soon as it is reviewed
hkaiser has quit [Quit: bye]
anushi has joined #ste||ar
Anushi1998 has joined #ste||ar
<Anushi1998> nikunj: Why don't u add a branch and make a second PR? Is there any problem in that or the second PR can only be make if first one is merged?
<nikunj> Anushi1998: the second pr cannot be worked on until the first one is not merged
<Anushi1998> okay
<nikunj> It involves additional code in the file of my first pr.
mcopik has quit [Ping timeout: 245 seconds]
aserio has joined #ste||ar
mcopik has joined #ste||ar
<jakub_golinowski> M-ms, the build in release mode has linking errors as before in a clean dir
<M-ms> jakub_golinowski: ok, thanks
<M-ms> still rebuilding here
<jbjnr> M-ms: are you in zurich or basel?
<M-ms> jbjnr: basel
<jbjnr> ok. see you tomorrow. Is the conf. centre small enough that I'll find everyone easily?
<jbjnr> I probably won't arrive until lunchtime
<M-ms> I see you're coming here as well...
<jbjnr> yup. meeting
hkaiser has joined #ste||ar
<M-ms> yep, it's reasonably small
<M-ms> coffee breaks in one hall, otherwise write on slack
<hkaiser> jbjnr: I'm ready whenever you are
akheir has joined #ste||ar
<M-ms> jakub_golinowski: getting the linker errors now on my work laptop, must have something different on my personal one... but now I can at least start looking into it
<nikunj> hkaiser: can we reschedule our skype meet to Wednesday or Thursday? I had to talk mainly about my implementation of Linux and Mac OS. Now that they are done (almost), I can work on Windows. I think I can get some visible leads until Wednesday to discuss it with you.
nikunj97 has joined #ste||ar
nikunj has quit [Ping timeout: 276 seconds]
<aserio> heller: yt?
<heller> aserio: hey
<aserio> heller: welcome to the team
<heller> aserio: he, thanks ;)
<heller> aserio: see pm please ;)
<hkaiser> nikunj97: sure, works for me (Thursday)
<hkaiser> let's rather do Friday
<nikunj97> hkaiser: ok
<nikunj97> I'll research ways to get things done on windows till then
nikunj1997 has joined #ste||ar
nikunj97 has quit [Ping timeout: 264 seconds]
anushi has quit [Read error: Connection reset by peer]
anushi has joined #ste||ar
anushi has quit [Remote host closed the connection]
Anushi1998 has quit [Quit: Bye]
<jakub_golinowski> M-ms, I realized that 6 CEST is now :D do you have time to look at the gdoc?
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 265 seconds]
aserio1 is now known as aserio
<M-ms> jakub_golinowski: yep, thanks
nikunj1997 has quit [Ping timeout: 240 seconds]
<github> [hpx] hkaiser created destroy_parcel (+1 new commit): https://git.io/fySM9
<github> hpx/destroy_parcel a79d051 Hartmut Kaiser: Making sure all parcels get destroyed on an HPX thread (TCP pp)
anushi has joined #ste||ar
anushi has quit [Remote host closed the connection]
<github> [hpx] hkaiser force-pushed destroy_parcel from a79d051 to 0d9a425: https://git.io/fyH4L
<github> hpx/destroy_parcel 0d9a425 Hartmut Kaiser: Making sure all parcels get destroyed on an HPX thread (TCP pp)
anushi has joined #ste||ar
<github> [hpx] hkaiser force-pushed destroy_parcel from 0d9a425 to 8e2d7c1: https://git.io/fyH4L
<github> hpx/destroy_parcel 8e2d7c1 Hartmut Kaiser: Making sure all parcels get destroyed on an HPX thread (TCP pp)...
aserio has quit [Ping timeout: 255 seconds]
Anushi1998 has joined #ste||ar
jakub_golinowski has quit [Ping timeout: 276 seconds]
<Guest87328> [hpx] hkaiser opened pull request #3361: Making sure all parcels get destroyed on an HPX thread (TCP pp) (master...destroy_parcel) https://git.io/fSs66
jaafar has joined #ste||ar
mcopik has quit [Ping timeout: 248 seconds]
mcopik has joined #ste||ar
mcopik has quit [Ping timeout: 276 seconds]
jakub_golinowski has joined #ste||ar
diehlpk_mobile has joined #ste||ar
<hkaiser> jbjnr: could you give me the link to the nvidia gpu layering workshop announcement, please
jakub_golinowski has quit [Ping timeout: 276 seconds]
aserio has joined #ste||ar
hkaiser has quit [Quit: bye]
jbjnr has quit [Remote host closed the connection]
hkaiser has joined #ste||ar
aserio1 has joined #ste||ar
<github> [hpx] khuck pushed 1 new commit to apex_fixing_null_wrapper: https://git.io/fSsQa
<github> hpx/apex_fixing_null_wrapper e63fcf6 Kevin Huck: Trying to make circleci happy
aserio has quit [Ping timeout: 240 seconds]
aserio1 has quit [Ping timeout: 240 seconds]
aserio has joined #ste||ar
jakub_golinowski has joined #ste||ar
<parsa[w]> is it possible to determine if we're on locality#0 after hpx::finalize()?
<parsa[w]> hkaiser: ^
aserio has quit [Ping timeout: 240 seconds]
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
<parsa[w]> hkaiser: is it possible to determine if we're on locality#0 after hpx::finalize()?
<hkaiser> parsa[w]: not sure what you mean
aserio has joined #ste||ar
<parsa[w]> hpx_main finishes execution, and i expect some string to be printed, which happens on locality 0. i want to check for that string when i'm on locality 0
<parsa[w]> hkaiser: ^
<hkaiser> from phylanx?
<hkaiser> I mean from physl?
<parsa[w]> any hpx application
<parsa[w]> in main()
<hkaiser> hpx::cout always goes to locality 0
<hkaiser> debug() as well
<parsa[w]> yes, but main is run on every locality
<K-ballo> new info on that action template on the slack channel
<parsa[w]> which means the other process fails
<hkaiser> you can't check after finalize whether you are on locality 0
<hkaiser> only way is to store the locality in a variable before finalize so you can use it afterwards
<hkaiser> parsa[w]: I just merged #3361
<github> [hpx] hkaiser pushed 1 new commit to master: https://git.io/fSsxO
<github> hpx/master 6bda03a Hartmut Kaiser: Merge pull request #3361 from STEllAR-GROUP/destroy_parcel...
<parsa[w]> thanks
<github> [hpx] hkaiser deleted destroy_parcel at 8e2d7c1: https://git.io/fSsx3
<heller> hkaiser: what's your stance on kicking the asio based PP in favor of a libfabric only solution?
<hkaiser> heller: if we can get it to work on any platform that supports sockets, sure - we won't fully get rid of asio this way though
<hkaiser> heller: what would be the rationale of doing this?
<heller> Simplifying the whole parcelhandler code by just using libfabric for the communication. This way, we can fully utilize the network. I have a prototype implementation that's on par in terms of latency and bandwidth with MPI. For window size of 1 and single thread
<heller> For the OSU test
<hkaiser> what about bootstrapping?
<heller> Solved
<hkaiser> nice
<heller> Even without PMI
<hkaiser> well, as a first step I'd say - let's add it as an additional pp
<heller> Full zero copy capable ;)
<heller> Hmm
<heller> Not sure if that's going to work out though
<hkaiser> why?
<K-ballo> PMI?
<K-ballo> MPI
<K-ballo> (I read PMI in passing and thought of Phillip Morris International)
<heller> I changed the serialization stuff. Mainly to have easier preprocessing and rdma reads on demand
<K-ballo> oh, it's a thing
<heller> K-ballo: process management interface
<hkaiser> heller: what about the mpi pp?
<heller> I started bottom up. As said, it's just a prototype so far and not yet fully integrated
<hkaiser> heller: will that make the mpi pp obsolete as well?
<heller> The mpi pp has no need to exist anymore :p
<heller> Yes
<heller> That's the goal
<hkaiser> ok
<hkaiser> this needs some discussion
<heller> It will certainly be a disruptive step since I expect some bugs
<heller> Sure, that's why I'm bringing it up
<hkaiser> I'm not in favor of throwing away everything we have in terms of networking and replace it with something new in one bid sweep
<hkaiser> big*
<heller> I understand. The two things could happily coexist
<heller> They in fact do at the moment
<hkaiser> so what's the problem with leaving the existing tcp pp in place for a while?
<heller> No problem at all. This new code would make the current parcel handling obsolete.
<hkaiser> I understand
<heller> Having a plan on when to remove would be good
<hkaiser> but as said, I think this change should be done in steps over at least 2 releases
<heller> Ok
<heller> No problem.
<hkaiser> one release have the new stuff in but not as the default, and the next release have it on by default, leaving the old stuff in on demand
<hkaiser> third release - remove things
<hkaiser> now, the quicker you do the releases, the quicker the stuff gets in ;-)
<heller> The risk is: bugs, changed cmake step (need to point to a libfabric install) and a potential problem when not using slurm/pbs/alps for distributed applications. Libfabric might get discontinued and we ended up with a pretty coupled code base and need to invest there
<heller> The gain: significantly faster distributed applications
<heller> And making John happy with the rdma transfers
<hkaiser> heller: sure, I'm behind this - just a bit cautious
<heller> Good
<heller> I hope that it works reasonably on Windows and osx
<heller> They claim it does...
<hkaiser> heller: sure, if not we can create some pressure through Chris
diehlpk_mobile has quit [Read error: Connection reset by peer]
jakub_golinowski has quit [Ping timeout: 256 seconds]
diehlpk_mobile has joined #ste||ar
<Anushi1998> Why we need to add new split credits? Since we have acquired the lock the credits will be replinshed and again whenever it is split it would be simply divided.
<Anushi1998> ahttps://github.com/STEllAR-GROUP/hpx/blob/master/src/runtime/naming/name.cpp#L288 Why is it not 1 bcoz we are replenishing only when we are exhausted and 2 can be divided further?
jakub_golinowski has joined #ste||ar
<hkaiser> Anushi1998: we replenish only once the credit has been exhausted
aserio has quit [Quit: aserio]
<Anushi1998> hkaiser: ^^
<hkaiser> why?
<Anushi1998> becoz when we have 1 credit and we still want to split then we should replenish both
<hkaiser> creadits are always stored as log(credit)
<hkaiser> so the minimal useful credit is 2^^1
<Anushi1998> okay
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
<hkaiser> we do not hold a lock during credit splitting
<hkaiser> that means that concurrently two or more of those operations could happen
<hkaiser> we need to account for those
<Anushi1998> but whenever we are splitting it will reduce the credit or replinish completely (which in case of addition will give overflow)
<hkaiser> Anushi1998: yah, but it might replenish more than once
<Anushi1998> so u mean to say if I have 2 credit and it's splitted twice concurrently it will 2*HPX_GLOBALCREDIT_INITIAL ?
<hkaiser> yes
<Anushi1998> Why we have chosen such design, it can lead to large no. of credits?
<hkaiser> Anushi1998: that's to avoid holding a lock while a possibly remote operation is underway
<Anushi1998> hkaiser: Okay, thanks :)
<hkaiser> K-ballo: is there anything preventing me from using std::monostate not as the first member of a variant?
Anushi1998 has quit [Quit: Bye]
<jakub_golinowski> M-ms, I was trying to build the MartyCam app but got this errors https://pastebin.com/GsiabTz5
<jakub_golinowski> I tried rebuilding opencv with the options suggested in the install instructions of MartyCam but it still did not help. Now my guess is that I am using recent master and this might be the issue. In the mean time I am reading the source code of the app
<K-ballo> hkaiser: nope
jakub_golinowski has quit [Ping timeout: 260 seconds]
nikunj1997 has joined #ste||ar
<github> [hpx] khuck pushed 1 new commit to apex_fixing_null_wrapper: https://git.io/fSGte
<github> hpx/apex_fixing_null_wrapper a68ef88 Kevin Huck: Merge branch 'master' into apex_fixing_null_wrapper
<github> [hpx] khuck pushed 1 new commit to apex_fixing_null_wrapper: https://git.io/fSGtJ
<github> hpx/apex_fixing_null_wrapper ee55d5d Kevin Huck: Merge branch 'master' into apex_fixing_null_wrapper
<nikunj1997> hkaiser: 4 tests failed in my Mac OS test (2 of them timed out). 1 tests passed later when I reran it. So overall 99% tests passed. The reason for timed out tests could be due to RAM shortage (I'm running it on VM).
<github> [hpx] khuck opened pull request #3363: Apex fixing null wrapper (master...apex_fixing_null_wrapper) https://git.io/fSGtG
<hkaiser> nikunj1997: sounds promising
<nikunj1997> hkaiser: do you think implementing mainCRTStartup again would be a good idea?
<nikunj1997> Referencing to Billy's mail
<hkaiser> an experimental implementation doesn't sound too bad
<nikunj1997> that's what I think as well. We can always update them as msvc updates it's versions
<hkaiser> right
<nikunj1997> actually mainCRTStarup will provide all the flexibility we require for implementing it
<nikunj1997> Given init_seg cannot force itself to run after all global objects
diehlpk_mobile has quit [Read error: Connection reset by peer]
<K-ballo> hkaiser: what had you in mind?
<K-ballo> that doesn't read well...
<hkaiser> K-ballo: I justed thought I would need some meaningles empty-state for the variant
<hkaiser> just*