hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
hello has joined #ste||ar
hello has quit [Ping timeout: 244 seconds]
hello has joined #ste||ar
hello has quit [Ping timeout: 246 seconds]
hello has joined #ste||ar
hello has quit [Ping timeout: 240 seconds]
hkaiser has quit [Quit: bye]
hello has joined #ste||ar
hello has quit [Ping timeout: 250 seconds]
hello has joined #ste||ar
hello has quit [Ping timeout: 272 seconds]
time_ has joined #ste||ar
time_ has quit [Ping timeout: 250 seconds]
time_ has joined #ste||ar
time_ is now known as hello
time_ has joined #ste||ar
hello has quit [Remote host closed the connection]
time_ is now known as hello
hello has quit [Ping timeout: 240 seconds]
jbjnr has joined #ste||ar
hello has joined #ste||ar
hello has quit [Ping timeout: 246 seconds]
hello has joined #ste||ar
hello has quit [Remote host closed the connection]
hello has joined #ste||ar
nikunj has quit [Quit: Leaving]
nikunj has joined #ste||ar
david_pfander has joined #ste||ar
david_pfander has quit [Quit: david_pfander]
david_pfander has joined #ste||ar
<heller_> simbergm: looks like the cherry pick went haywire
<simbergm> heller_: yeah, just saw that
<heller_> simbergm: how about we just fix everything on master and call it 1.3?
<heller_> how much time do we have for f30?
<simbergm> heller_: yes, I'd like that, I'm just worried about the timing
<heller_> what's your concern?
<simbergm> I think it was something like 2 weeks
<simbergm> precisely fedora
<simbergm> and I'm not sure if we can get 1.3.0 on the older fedoras
<simbergm> or if they only do patch releases
<simbergm> but if we can just do 1.3.0 we should do that
<heller_> ok
<heller_> or well, roll the cmake cherry picks back, flag the cmake stuff as known problem
<simbergm> 14:55 <diehlpk_work> 2019-02-19 Branch Fedora 30 from Rawhide (Rawhide becomes future F31)
<simbergm> heller_: yeah, I'll check quickly if the two PRs I couldn't apply on friday are needed (I don't remember anymore how they hang together)
<simbergm> but otherwise let's check with diehlpk_work when he gets on if just 1.3.0 would be an option
<heller_> oh, that's next week
<heller_> impossible
<heller_> fixing memory leaks right now..
<simbergm> yes, but then:
<simbergm> 14:56 <diehlpk_work> Not at all, 2019-03-26 Beta Release (Preferred Target)
<simbergm> 14:56 <diehlpk_work> This is the deadline to have hpx in fedora 30
<heller_> enabling sanitizers is a black whole...
<heller_> black hole
<heller_> ahh
<heller_> that'd be doable
<simbergm> not sure what date we actually need to stick to
<simbergm> yep
<simbergm> yeah, sanitizers
<simbergm> but even if you don't get everything fixed we should still merge the ones you do find
<heller_> yeah, I try to seperate the commits out as much as I can
<heller_> but I really want to have them run with every commit...
<heller_> so we need a clean start ;)
<heller_> that's still up
<jbjnr_> what is f30?
<heller_> fedora 30
hello has quit [Ping timeout: 268 seconds]
hello has joined #ste||ar
hello has quit [Ping timeout: 250 seconds]
<K-ballo> that unique_ptr in exception_info seems impossible
hello has joined #ste||ar
hello has quit [Remote host closed the connection]
<heller_> K-ballo: are exceptions ever copied?
hello has joined #ste||ar
<K-ballo> yes
<K-ballo> and it must be a nothrow copy
<heller_> ok, wonder why it worked then...
hkaiser has joined #ste||ar
<hkaiser> heller_: I found one issue with migration and I'm closing in on the actual problem
time_ has joined #ste||ar
hello has quit [Remote host closed the connection]
time_ is now known as hello
<heller_> hkaiser: excellent
<heller_> hkaiser: we have tons of memory leaks ;)
<hkaiser> heller_: I think it's a race between the flag in the local agas where the object lives with the one in remote agas managing the object
<heller_> ok
<hkaiser> heller_: we shouldn't have any ;-)
<heller_> indeed
<heller_> in the process of fixing them
<hkaiser> cool
<heller_> this sanitizer business is a rabbit hole deeper than alice's
<hkaiser> lol
<heller_> just found one related to this: https://github.com/STEllAR-GROUP/hpx/issues/3449
<hkaiser> nod
<hkaiser> figures
<heller_> hkaiser: this is interesting:https://stellar-group.gitlab.io/-/hpx/-/jobs/159443858/artifacts/tests.unit.html#775cc3d3-3e99-462e-91fa-e374cdeb7c28
<hkaiser> never seen this before
<hkaiser> heller_: I think that can be explained by the problem I fixed yesterday
<hkaiser> it is unrelated to the hang, I think
<heller_> it's showing up in basic_action-simplify
<hkaiser> interesting
<K-ballo> mh?
<heller_> K-ballo: the only explanation I have here is that your simplifcation somehow missed out on the pinned_ptr part
<heller_> let me check the PR
<K-ballo> context?
<heller_> we are seeing hangs in migrate_component
<heller_> hkaiser is trying to fix it
<heller_> now, on your branch, I see that failure
<hkaiser> this attached the continuation to the outer future, instead of the inner one - that causes end_migration to be executed too early
<heller_> aha
<heller_> that's unrelated i think though
<hkaiser> well, probably
<hkaiser> actions should always pin the object
<heller_> right
<heller_> which is probably just missing somewhere
<hkaiser> most likely
<heller_> got it
<heller_> or maybe not
<heller_> ok, this actually makes tons of sense now...
<heller_> or maybe not...
* heller_ just shuts up now
bibek has joined #ste||ar
<heller_> hkaiser: I think there might be another race between decoding the parcel and actually pinning the component
<hkaiser> is there now?
<hkaiser> it's pinned right away
<heller_> hkaiser: we pin it during decoding, but it is left unpinned until the actual thread is scheduled
<heller_> unless I miss something obvious
<hkaiser> heller_: parcel::load_schedule is executed while pinned and schedules the thread which pins it again
<heller_> hkaiser: ok, then I miss something obvious...
<heller_> ahh, decorate_action...
<heller_> doesn't explain why the object isn't pinned in K-ballo's branch...
<heller_> only happens for the actions returning a future
<K-ballo> I touched some code that involved pinning and was specialized for returning futures
<K-ballo> I'll check it out later
<heller_> I went through it, but couldn't find where the missing piece is
<K-ballo> I'd say don't worry about it, if the PR breaks it we'll fix it, I'd worry if master is broken
<heller_> master is broken in different ways
<K-ballo> lol, fix master and we'll make sure the PR does not regress it
<simbergm> diehlpk_work: yt?
<heller_> K-ballo: the failure on master is 100% different than the failure on basic_action-simplify. On master we see occasional hangs, on basic_action-simplify, we see that the object isn't pinned at all when executing an action returning a future
<K-ballo> ok, if the PR fixes the issue from master it would be entirely by chance
<K-ballo> adding more bugs is much easier than removing them
<hkaiser> heller_: the pin count is wron gonly for the lazy cases, that means that the returned future doesn't keep the pined_ptr alive anymore
<heller_> yes
hkaiser has quit [Quit: bye]
aserio has joined #ste||ar
<diehlpk_work> simbergm, Yes
<diehlpk_work> Just arrived in the office
<diehlpk_work> simbergm, I have an idea for the cmake issue for the 1.2.1 release
<diehlpk_work> it is not nice, but it will work, we can hardcode the cmake module path for fedora
<diehlpk_work> So for the 1.2.1 release we would have this workaround patch and remove it fro the 1.3 release
<simbergm> diehlpk_work: right, is that definitely the cause of those problems?
<simbergm> so since 1.2.1 is starting to become a bit of an annoyance, is it possible for us to just release 1.3.0 and get that into fedora after the fact or can we only update those packages to 1.2.X?
<diehlpk_work> simbergm, We would need at least the patch for boost 1.69 to have hpx in fedora 30
<diehlpk_work> I would recommend to use this fix and disable the devel package
<diehlpk_work> and the patch for s390x
<diehlpk_work> simbergm, What about adding these two patches as minimum
<diehlpk_work> So we would have hpx 1.2.1 in fedora 30
<simbergm> well, imo the hpx-devel package is the more important of the two
<simbergm> so it wasn't quite clear to me, if is the deadline to have something in fedora 30 February 19th or March 26th?
hkaiser has joined #ste||ar
<diehlpk_work> We can always push it later to fedora 30, but if one uses our package and it is not available at the release, he can not upgrade to fedora 30
<diehlpk_work> He would need to remove hpx first
<diehlpk_work> xfce was too late for fedora 29 and I could not update my machine until their packages were available
<diehlpk_work> 2019-04-16 Final Freeze (*) - 10 days would be good to have hpx 1.2.1 in fedora 30
<K-ballo> what were the rules for using hpx_main.hpp on windows? I'm getting missing entry point
<simbergm> ok, that means we can also update the package on f28 and 29 after the fact?
<diehlpk_work> Yes
<simbergm> so what is the actual deadline for f30?
<diehlpk_work> 2019-04-16 Final Freeze (*)
<simbergm> K-ballo: just define int main?
<K-ballo> there is one (trying to run some test as standalone app)
<diehlpk_work> simbergm, I would prefer to have minimal patch for fedora 30
<diehlpk_work> let us add boost 1.69 and s390x
<diehlpk_work> and fix the devel package
<diehlpk_work> simbergm, I would suggest to do a hpx 1.2.1 release which contains the boost 1.69 only fix
<diehlpk_work> So people could use hpx with boost 1.69 which I think is important
<diehlpk_work> So we have a small release
<diehlpk_work> I will add the patch for s390x to the Fedora package
<diehlpk_work> and we will work on a patch for the devel package
<diehlpk_work> So we would down stream this minor patch to f28 and f29 and we are done
<diehlpk_work> For Fedora 31 we can work on the hpx 1.3 release
<simbergm> diehlpk_work: let me have a look at the release branch, I've applied the cmake patches that I thought would fix those problems but I've clearly applied them wrongly
<simbergm> if I can get those working we're done
<simbergm> otherwise the point is that if the deadline for f30 is in april we'll just release 1.3.0 because it should already have all those fixes and we'll save this hassle of juggling a patch release
<simbergm> that would be plenty of time to make a release
adityaRakhecha has joined #ste||ar
<diehlpk_work> simbergm, Ok, but normally there is no major version upgrade within Fedora. So updating f28 and f29 to 1.3 would be strange
<simbergm> diehlpk_work: ok, that answers my question
<diehlpk_work> simbergm, What we could do is keep 1.2.0 for f28 and f29 and just apply the patch for the devel package there
<diehlpk_work> So we could do hpx 1.3 for f30
<diehlpk_work> I think this would be the easiest way
<diehlpk_work> because on f28 and f29 we do nto need to deal with the boost patch
<diehlpk_work> *not
mbremer has joined #ste||ar
<diehlpk_work> So s390x and boost 1.69 are will be in hpx 1.3
<simbergm> diehlpk_work: give me a sec, my cherry-picking definitely went wrong
aserio1 has joined #ste||ar
<simbergm> if that works we won't need to worry about differenct patches for different fedoras etc
aserio has quit [Ping timeout: 268 seconds]
aserio1 is now known as aserio
<K-ballo> heller_: is the migrate_component test failing consistently for the PR?
<diehlpk_work> simbergm, Sure, take your time.
<simbergm> I've just updated the release branch, this time hopefully correctly
<simbergm> diehlpk_work: ^
<simbergm> let's see if rostam agrees
<simbergm> diehlpk_work: it might be a good idea to set up a circleci/whatever test which would build your fedora recipe, install it and try creating a hello world based on that
<simbergm> also #3686 should help catching these kind of problems
<heller_> K-ballo: I'll check, but I'd assume so
<heller_> K-ballo: It failed three times with the same problems, so I'd say yes:
<K-ballo> it's running on a single locality here for some reason
mbremer has quit [Quit: Leaving.]
<K-ballo> I suppose it needs some sort of test runner wrapper
<diehlpk_work> simbergm, We could download the rpm from the fedora server and install it on docker 32bit and 64 bit and compile hello world
hello has quit [Ping timeout: 250 seconds]
<diehlpk_work> or we look into their testing system
<diehlpk_work> They have a build and testing system
<diehlpk_work> I asked in the fedora-devel channel and on the build system you can only access the installation patch and run test there
<diehlpk_work> The rpm package is generated as the last step
eschnett has joined #ste||ar
jbjnr has quit [Ping timeout: 250 seconds]
<simbergm> diehlpk_work: whatever works
<diehlpk_work> Ok, I will work on a solution
<K-ballo> I notice a number of technical differences with the PR that should not be affecting behavior, one (or more) of them might be
quaz0r has quit [Ping timeout: 244 seconds]
david_pfander has quit [Ping timeout: 246 seconds]
<adityaRakhecha> HPX is participating in GSOC this year. Right?
detan has joined #ste||ar
<hkaiser> adityaRakhecha: yes
<hkaiser> if we get accepted by google
<K-ballo> how did one run multiple localities? -l2 -0 ?
<hkaiser> K-ballo: one locality --hpx:localities=2 --hpx:node=0, the other one --hpx:node=1
<hkaiser> or -l2 -0 and -1
<K-ballo> -l2 -0 and -1 is what I have... pretty good, considering
<K-ballo> -l2, -0, and -1 are being rejected.. the full form works
quaz0r has joined #ste||ar
<hkaiser> depends on whether your app uses hpx_main.hpp or not
<detan> not sure if this is the right place to get help compiling hpx?
<parsa_> detan: it is
<detan> Brilliant.. getting an error on Ubuntu 18.04.1/gcc-6/boost 1.69
<K-ballo> I'm using hpx_main, but I do not trust it
<K-ballo> detan: error_code constructor is explicit, known issue, was introduced by the boost release
<hkaiser> we should generally disable the shortcuts for the command line and use only if somebody explicitly enables them
<parsa_> detan: is your hpx is the current HPX master HEAD?
<detan> @K-ballo ok, i should use 1.68 instead?
<detan> @parsa_ 1.2 release
<K-ballo> 1.68 will work, 1.69 was released after hpx 1.2
<K-ballo> we are working on a 1.2.1 release, but it is experiencing some delays
<K-ballo> otherwise you could apply a simple patch to fix the issue, if you want to stick o 1.69
<detan> sorry, new to freenode... kinda lost here
<detan> @K-ballo what is the most recommended version for boost? The most tested...
<detan> ah.. ok I will try 1.68...
<detan> thx
<K-ballo> no idea... for hpx 1.2 I guess that would be 1.68, or maybe 1.67? we are testing with 1.58.0 and up
<detan> great.. let me try that...
<K-ballo> it's just that boost 1.69 introduced a breaking change *after* we released hpx 1.2
<diehlpk_work> adityaRakhecha, February 26 12:00 UTC List of accepted mentoring organizations published
<detan> @K-ballo totally understand...
<detan> thanks
<detan> Should I use gcc-6 if I intend to use hpx+cuda?
<diehlpk_work> detan, 6 or 7 should be fine
<diehlpk_work> And Cuda 9.x
<K-ballo> heller_: I'm getting consistent segfaults in the lock detection logic, must be doing something wrong here
<detan> @diehlpk_work thanks!
hello has joined #ste||ar
jbjnr has joined #ste||ar
hello has quit [Remote host closed the connection]
hello has joined #ste||ar
<K-ballo> it's attempting to concurrently erase from the same map
<hkaiser> K-ballo: it should be locked
<hkaiser> next to the map is a mutex
<K-ballo> yeah, something else is off...
<K-ballo> those are different instances of the map
hello has quit [Ping timeout: 268 seconds]
<K-ballo> heller_: one of the failures come from pinning a component using a const-qualified type
aserio has quit [Ping timeout: 250 seconds]
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
<hkaiser> K-ballo: pinning should be mutable
<K-ballo> ok
<K-ballo> and the other failure is just the same failure as I forgot to update the exe for the other "locality"
<K-ballo> lol, now my run has deadlocked while grabbing locks for destroying iterators for the lock register map
parsa_ is now known as parsa
<K-ballo> this debug iterator stuff is crazy
<heller_> K-ballo: as long as this pin.count() != 0 failure is gone, I'm happy
<heller_> K-ballo: perfect! thanks
<heller_> did you notice any decrease in compile time and/or binary size?
hello has joined #ste||ar
<K-ballo> I've never ever in the history of the universe have I noticed a decrease in compile time :P
<K-ballo> some debug bloat is gone, a fraction of that would have affected binary size
<heller_> yeah ... the second law is killing us all
<heller_> eventually
<K-ballo> that reminds me I was working on some reports on memory usage during compilation for 1.2
hello has quit [Ping timeout: 272 seconds]
aserio has joined #ste||ar
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 268 seconds]
aserio1 is now known as aserio
<detan> not related to hpx but you guys might know: anyone struggled with __host__ and __device__ attributes for nested lambdas in cuda?
hello has joined #ste||ar
<detan> wondering if anyone here has tested it with clang -> http://lists.llvm.org/pipermail/llvm-bugs/2016-September/051197.html
<diehlpk_work> detan, You talk about constexpress
<diehlpk_work> or?
hello has quit [Ping timeout: 246 seconds]
<detan> no... is just that when you define [=] __device__ () lambda.. and you nest another lambda inside, it must be a [=] __device__ () lambda as well...
<detan> but if you the second lamda is defined inside my library I need to infer which attribute the user used... and have a call_device and call_host variants...
<detan> it all seems so clumsy
<detan> looks like the link that i sent is related to that... but for clang. So, is hpx+cuda+clang operational?
<detan> oh... there is a branch for clang+cuda...
detan has quit [Ping timeout: 256 seconds]
hkaiser has quit [Quit: bye]
hello has joined #ste||ar
hello has quit [Ping timeout: 246 seconds]
quaz0r has quit [Ping timeout: 246 seconds]
quaz0r has joined #ste||ar
eschnett has quit [Quit: eschnett]
hello has joined #ste||ar
hello has quit [Remote host closed the connection]
hello has joined #ste||ar
hello has quit [Client Quit]
hkaiser has joined #ste||ar
jbjnr has quit [Ping timeout: 264 seconds]
aserio has quit [Quit: aserio]
<K-ballo> we accidentally ended with two latch.cpp files after one of my changes, does that still confuse msvc?
<K-ballo> or rather msbuild I suppose
<hkaiser> K-ballo: doesn't confuse, just prevents concurrent compilation in VS