hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
hello has joined #ste||ar
hello has quit [Ping timeout: 244 seconds]
hello has joined #ste||ar
hello has quit [Ping timeout: 246 seconds]
hello has joined #ste||ar
hello has quit [Ping timeout: 240 seconds]
hkaiser has quit [Quit: bye]
hello has joined #ste||ar
hello has quit [Ping timeout: 250 seconds]
hello has joined #ste||ar
hello has quit [Ping timeout: 272 seconds]
time_ has joined #ste||ar
time_ has quit [Ping timeout: 250 seconds]
time_ has joined #ste||ar
time_ is now known as hello
time_ has joined #ste||ar
hello has quit [Remote host closed the connection]
time_ is now known as hello
hello has quit [Ping timeout: 240 seconds]
jbjnr has joined #ste||ar
hello has joined #ste||ar
hello has quit [Ping timeout: 246 seconds]
hello has joined #ste||ar
hello has quit [Remote host closed the connection]
hello has joined #ste||ar
nikunj has quit [Quit: Leaving]
nikunj has joined #ste||ar
david_pfander has joined #ste||ar
david_pfander has quit [Quit: david_pfander]
david_pfander has joined #ste||ar
<heller_>
simbergm: looks like the cherry pick went haywire
<simbergm>
heller_: yeah, just saw that
<heller_>
simbergm: how about we just fix everything on master and call it 1.3?
<heller_>
how much time do we have for f30?
<simbergm>
heller_: yes, I'd like that, I'm just worried about the timing
<heller_>
what's your concern?
<simbergm>
I think it was something like 2 weeks
<simbergm>
precisely fedora
<simbergm>
and I'm not sure if we can get 1.3.0 on the older fedoras
<simbergm>
or if they only do patch releases
<simbergm>
but if we can just do 1.3.0 we should do that
<heller_>
ok
<heller_>
or well, roll the cmake cherry picks back, flag the cmake stuff as known problem
<heller_>
hkaiser: this is interesting:https://stellar-group.gitlab.io/-/hpx/-/jobs/159443858/artifacts/tests.unit.html#775cc3d3-3e99-462e-91fa-e374cdeb7c28
<hkaiser>
never seen this before
<hkaiser>
heller_: I think that can be explained by the problem I fixed yesterday
<hkaiser>
it is unrelated to the hang, I think
<heller_>
it's showing up in basic_action-simplify
<hkaiser>
interesting
<K-ballo>
mh?
<heller_>
K-ballo: the only explanation I have here is that your simplifcation somehow missed out on the pinned_ptr part
<heller_>
let me check the PR
<K-ballo>
context?
<heller_>
we are seeing hangs in migrate_component
<hkaiser>
this attached the continuation to the outer future, instead of the inner one - that causes end_migration to be executed too early
<heller_>
aha
<heller_>
that's unrelated i think though
<hkaiser>
well, probably
<hkaiser>
actions should always pin the object
<heller_>
right
<heller_>
which is probably just missing somewhere
<hkaiser>
most likely
<heller_>
got it
<heller_>
or maybe not
<heller_>
ok, this actually makes tons of sense now...
<heller_>
or maybe not...
* heller_
just shuts up now
bibek has joined #ste||ar
<heller_>
hkaiser: I think there might be another race between decoding the parcel and actually pinning the component
<hkaiser>
is there now?
<hkaiser>
it's pinned right away
<heller_>
hkaiser: we pin it during decoding, but it is left unpinned until the actual thread is scheduled
<heller_>
unless I miss something obvious
<hkaiser>
heller_: parcel::load_schedule is executed while pinned and schedules the thread which pins it again
<heller_>
hkaiser: ok, then I miss something obvious...
<heller_>
ahh, decorate_action...
<heller_>
doesn't explain why the object isn't pinned in K-ballo's branch...
<heller_>
only happens for the actions returning a future
<K-ballo>
I touched some code that involved pinning and was specialized for returning futures
<K-ballo>
I'll check it out later
<heller_>
I went through it, but couldn't find where the missing piece is
<K-ballo>
I'd say don't worry about it, if the PR breaks it we'll fix it, I'd worry if master is broken
<heller_>
master is broken in different ways
<K-ballo>
lol, fix master and we'll make sure the PR does not regress it
<simbergm>
diehlpk_work: yt?
<heller_>
K-ballo: the failure on master is 100% different than the failure on basic_action-simplify. On master we see occasional hangs, on basic_action-simplify, we see that the object isn't pinned at all when executing an action returning a future
<K-ballo>
ok, if the PR fixes the issue from master it would be entirely by chance
<K-ballo>
adding more bugs is much easier than removing them
<hkaiser>
heller_: the pin count is wron gonly for the lazy cases, that means that the returned future doesn't keep the pined_ptr alive anymore
<heller_>
yes
hkaiser has quit [Quit: bye]
aserio has joined #ste||ar
<diehlpk_work>
simbergm, Yes
<diehlpk_work>
Just arrived in the office
<diehlpk_work>
simbergm, I have an idea for the cmake issue for the 1.2.1 release
<diehlpk_work>
it is not nice, but it will work, we can hardcode the cmake module path for fedora
<diehlpk_work>
So for the 1.2.1 release we would have this workaround patch and remove it fro the 1.3 release
<simbergm>
diehlpk_work: right, is that definitely the cause of those problems?
<simbergm>
so since 1.2.1 is starting to become a bit of an annoyance, is it possible for us to just release 1.3.0 and get that into fedora after the fact or can we only update those packages to 1.2.X?
<diehlpk_work>
simbergm, We would need at least the patch for boost 1.69 to have hpx in fedora 30
<diehlpk_work>
I would recommend to use this fix and disable the devel package
<diehlpk_work>
and the patch for s390x
<diehlpk_work>
simbergm, What about adding these two patches as minimum
<diehlpk_work>
So we would have hpx 1.2.1 in fedora 30
<simbergm>
well, imo the hpx-devel package is the more important of the two
<simbergm>
so it wasn't quite clear to me, if is the deadline to have something in fedora 30 February 19th or March 26th?
hkaiser has joined #ste||ar
<diehlpk_work>
We can always push it later to fedora 30, but if one uses our package and it is not available at the release, he can not upgrade to fedora 30
<diehlpk_work>
He would need to remove hpx first
<diehlpk_work>
xfce was too late for fedora 29 and I could not update my machine until their packages were available
<diehlpk_work>
2019-04-16 Final Freeze (*) - 10 days would be good to have hpx 1.2.1 in fedora 30
<K-ballo>
what were the rules for using hpx_main.hpp on windows? I'm getting missing entry point
<simbergm>
ok, that means we can also update the package on f28 and 29 after the fact?
<diehlpk_work>
Yes
<simbergm>
so what is the actual deadline for f30?
<diehlpk_work>
2019-04-16 Final Freeze (*)
<simbergm>
K-ballo: just define int main?
<K-ballo>
there is one (trying to run some test as standalone app)
<diehlpk_work>
simbergm, I would prefer to have minimal patch for fedora 30
<diehlpk_work>
let us add boost 1.69 and s390x
<diehlpk_work>
and fix the devel package
<diehlpk_work>
simbergm, I would suggest to do a hpx 1.2.1 release which contains the boost 1.69 only fix
<diehlpk_work>
So people could use hpx with boost 1.69 which I think is important
<diehlpk_work>
So we have a small release
<diehlpk_work>
I will add the patch for s390x to the Fedora package
<diehlpk_work>
and we will work on a patch for the devel package
<diehlpk_work>
So we would down stream this minor patch to f28 and f29 and we are done
<diehlpk_work>
For Fedora 31 we can work on the hpx 1.3 release
<simbergm>
diehlpk_work: let me have a look at the release branch, I've applied the cmake patches that I thought would fix those problems but I've clearly applied them wrongly
<simbergm>
if I can get those working we're done
<simbergm>
otherwise the point is that if the deadline for f30 is in april we'll just release 1.3.0 because it should already have all those fixes and we'll save this hassle of juggling a patch release
<simbergm>
that would be plenty of time to make a release
adityaRakhecha has joined #ste||ar
<diehlpk_work>
simbergm, Ok, but normally there is no major version upgrade within Fedora. So updating f28 and f29 to 1.3 would be strange
<simbergm>
diehlpk_work: ok, that answers my question
<diehlpk_work>
simbergm, What we could do is keep 1.2.0 for f28 and f29 and just apply the patch for the devel package there
<diehlpk_work>
So we could do hpx 1.3 for f30
<diehlpk_work>
I think this would be the easiest way
<diehlpk_work>
because on f28 and f29 we do nto need to deal with the boost patch
<diehlpk_work>
*not
mbremer has joined #ste||ar
<diehlpk_work>
So s390x and boost 1.69 are will be in hpx 1.3
<simbergm>
diehlpk_work: give me a sec, my cherry-picking definitely went wrong
aserio1 has joined #ste||ar
<simbergm>
if that works we won't need to worry about differenct patches for different fedoras etc
aserio has quit [Ping timeout: 268 seconds]
aserio1 is now known as aserio
<K-ballo>
heller_: is the migrate_component test failing consistently for the PR?
<diehlpk_work>
simbergm, Sure, take your time.
<simbergm>
I've just updated the release branch, this time hopefully correctly
<simbergm>
diehlpk_work: ^
<simbergm>
let's see if rostam agrees
<simbergm>
diehlpk_work: it might be a good idea to set up a circleci/whatever test which would build your fedora recipe, install it and try creating a hello world based on that
<simbergm>
also #3686 should help catching these kind of problems
<heller_>
K-ballo: I'll check, but I'd assume so
<heller_>
K-ballo: It failed three times with the same problems, so I'd say yes:
<K-ballo>
it's running on a single locality here for some reason
mbremer has quit [Quit: Leaving.]
<K-ballo>
I suppose it needs some sort of test runner wrapper
<diehlpk_work>
simbergm, We could download the rpm from the fedora server and install it on docker 32bit and 64 bit and compile hello world
hello has quit [Ping timeout: 250 seconds]
<diehlpk_work>
or we look into their testing system
<diehlpk_work>
They have a build and testing system
<diehlpk_work>
I asked in the fedora-devel channel and on the build system you can only access the installation patch and run test there
<diehlpk_work>
The rpm package is generated as the last step
eschnett has joined #ste||ar
jbjnr has quit [Ping timeout: 250 seconds]
<simbergm>
diehlpk_work: whatever works
<diehlpk_work>
Ok, I will work on a solution
<K-ballo>
I notice a number of technical differences with the PR that should not be affecting behavior, one (or more) of them might be
quaz0r has quit [Ping timeout: 244 seconds]
david_pfander has quit [Ping timeout: 246 seconds]
<adityaRakhecha>
HPX is participating in GSOC this year. Right?
detan has joined #ste||ar
<hkaiser>
adityaRakhecha: yes
<hkaiser>
if we get accepted by google
<K-ballo>
how did one run multiple localities? -l2 -0 ?
<hkaiser>
K-ballo: one locality --hpx:localities=2 --hpx:node=0, the other one --hpx:node=1
<hkaiser>
or -l2 -0 and -1
<K-ballo>
-l2 -0 and -1 is what I have... pretty good, considering
<K-ballo>
-l2, -0, and -1 are being rejected.. the full form works
quaz0r has joined #ste||ar
<hkaiser>
depends on whether your app uses hpx_main.hpp or not
<detan>
not sure if this is the right place to get help compiling hpx?
<parsa_>
detan: it is
<detan>
Brilliant.. getting an error on Ubuntu 18.04.1/gcc-6/boost 1.69
<K-ballo>
I'm using hpx_main, but I do not trust it
<K-ballo>
detan: error_code constructor is explicit, known issue, was introduced by the boost release
<hkaiser>
we should generally disable the shortcuts for the command line and use only if somebody explicitly enables them
<parsa_>
detan: is your hpx is the current HPX master HEAD?
<detan>
@K-ballo ok, i should use 1.68 instead?
<detan>
@parsa_ 1.2 release
<K-ballo>
1.68 will work, 1.69 was released after hpx 1.2
<K-ballo>
we are working on a 1.2.1 release, but it is experiencing some delays
<K-ballo>
otherwise you could apply a simple patch to fix the issue, if you want to stick o 1.69
<detan>
sorry, new to freenode... kinda lost here
<detan>
@K-ballo what is the most recommended version for boost? The most tested...
<detan>
ah.. ok I will try 1.68...
<detan>
thx
<K-ballo>
no idea... for hpx 1.2 I guess that would be 1.68, or maybe 1.67? we are testing with 1.58.0 and up
<detan>
great.. let me try that...
<K-ballo>
it's just that boost 1.69 introduced a breaking change *after* we released hpx 1.2
<diehlpk_work>
adityaRakhecha, February 26 12:00 UTC List of accepted mentoring organizations published
<detan>
no... is just that when you define [=] __device__ () lambda.. and you nest another lambda inside, it must be a [=] __device__ () lambda as well...
<detan>
but if you the second lamda is defined inside my library I need to infer which attribute the user used... and have a call_device and call_host variants...
<detan>
it all seems so clumsy
<detan>
looks like the link that i sent is related to that... but for clang. So, is hpx+cuda+clang operational?
<detan>
oh... there is a branch for clang+cuda...
detan has quit [Ping timeout: 256 seconds]
hkaiser has quit [Quit: bye]
hello has joined #ste||ar
hello has quit [Ping timeout: 246 seconds]
quaz0r has quit [Ping timeout: 246 seconds]
quaz0r has joined #ste||ar
eschnett has quit [Quit: eschnett]
hello has joined #ste||ar
hello has quit [Remote host closed the connection]
hello has joined #ste||ar
hello has quit [Client Quit]
hkaiser has joined #ste||ar
jbjnr has quit [Ping timeout: 264 seconds]
aserio has quit [Quit: aserio]
<K-ballo>
we accidentally ended with two latch.cpp files after one of my changes, does that still confuse msvc?
<K-ballo>
or rather msbuild I suppose
<hkaiser>
K-ballo: doesn't confuse, just prevents concurrent compilation in VS