hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020
akheir has quit [Quit: Leaving]
diehlpk__ has quit [Remote host closed the connection]
diehlpk__ has joined #ste||ar
diehlpk__ has quit [Remote host closed the connection]
diehlpk__ has joined #ste||ar
bita_ has joined #ste||ar
diehlpk__ has quit [Remote host closed the connection]
diehlpk__ has joined #ste||ar
diehlpk__ has quit [Remote host closed the connection]
diehlpk__ has joined #ste||ar
nikunj97 has quit [Read error: Connection reset by peer]
diehlpk__ has quit [Ping timeout: 260 seconds]
bita_ has quit [Ping timeout: 260 seconds]
hkaiser_ has quit [Quit: bye]
nanmiao11 has quit [Remote host closed the connection]
weilewei has quit [Remote host closed the connection]
Yorlik has quit [Ping timeout: 240 seconds]
Yorlik has joined #ste||ar
diehlpk_work has quit [Remote host closed the connection]
diehlpk_work has joined #ste||ar
Yorlik has quit [Ping timeout: 240 seconds]
Yorlik has joined #ste||ar
hkaiser has joined #ste||ar
nikunj97 has joined #ste||ar
nikunj97 has quit [Quit: Leaving]
nikunj97 has joined #ste||ar
Nikunj__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 260 seconds]
nanmiao11 has joined #ste||ar
weilewei has joined #ste||ar
diehlpk__ has joined #ste||ar
<diehlpk_work> hkaiser, Meeting?
diehlpk__ has quit [Ping timeout: 260 seconds]
diehlpk__ has joined #ste||ar
diehlpk__ has quit [Ping timeout: 260 seconds]
Nikunj__ has quit [Read error: Connection reset by peer]
diehlpk_work has quit [Ping timeout: 260 seconds]
<nanmiao11> hkaiser meeting?
nikunj97 has joined #ste||ar
bita_ has joined #ste||ar
Nikunj__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 260 seconds]
diehlpk_work has joined #ste||ar
nikunj has quit [Ping timeout: 256 seconds]
nikunj has joined #ste||ar
oj has joined #ste||ar
diehlpk__ has joined #ste||ar
diehlpk__ has quit [Ping timeout: 260 seconds]
oj has quit [Quit: leaving]
<weilewei> I wonder what reason will cause this error: ../libs/synchronization/src/mutex.cpp:39: void hpx::lcos::local::mutex::lock(const char*, hpx::error_code&): Assertion 'threads::get_self_ptr() != nullptr' failedAborted
<weilewei> is hpx runtime not started correctly?
<K-ballo> or code not running on an hpx thread
<weilewei> How do I make sure it is running on an hpx thread? Is adding #include <hpx/hpx_main.hpp> sufficient enough?
<zao> The wrapper will ensure that the body of the user-supplied main is running in a HPX thread, but still have to be careful in globals and other pre/post-main objects.
<weilewei> @zao ok, thanks
<zao> (it macros the existing main to a regular function and sneaks in an actual main via a helper library that initializes HPX and calls the user "main" as a task)
<zao> Note that if you use it, you need to explicitly return from your "main", as the rules about implicit return only holds for actual main.
<weilewei> yea, I do have a return result; thing in my main
<weilewei> that's good to know about this
Nikunj__ is now known as nikunj97
<nikunj97> weilewei, I tried to wrap global objects on HPX threads as well. But it required changes in the compiler toolchain, something hkaiser asked me to stay away from.
<nikunj97> so a main wrapper to get work done is what we have in HPX as of now.
<weilewei> nikunj97 ok I see
<weilewei> I think I spotted the problem, the app is not using hpx thread, but it tries to access hpx thread id, which leads to fail
<nikunj97> if you're calling that function from something running on hpx thread, it should run on hpx thread as well
<nikunj97> unless of course, you are trying to call it from a global scope
<hkaiser> nikunj97: hey, yt?
<nikunj97> hkaiser, here
<hkaiser> you said you had worked on the resiliency stuff by basing it on the current HPX version of the code
<nikunj97> yes
<hkaiser> it doesn't look like that
<nikunj97> what how?
<nikunj97> I did change the directory structure if that's what you mean
<nikunj97> but the base code is the same as the one in HPX
<hkaiser> not for me
<hkaiser> I have checked out this: git@github.com:STEllAR-GROUP/resilience.git
<hkaiser> master branch
<hkaiser> is that the correct code?
<nikunj97> yes.
<nikunj97> ohh wait damn. You're right it is not HPX. I copied the resiliency repo.
<nikunj97> the one from last year.
<nikunj97> I should change that.
<hkaiser> so it's one year old code :/
<hkaiser> yes, please
<nikunj97> there has been code changes since?
<hkaiser> Keita asked me to start looking into adapting our code to Kokkos
<hkaiser> compare the files to see for yourself
<hkaiser> nikunj97: I'm likely going to change things in HPX - I strongly suggest you work against the HPX code base as well
<nikunj97> so work as a PR on HPX?
<hkaiser> that will allow you to rebase etc
<hkaiser> yes
<nikunj97> makes sense. Let's kill this separate repo. I'll push changes directly on HPX.
<hkaiser> thanks!
<nikunj97> btw, I talked to parsa about his work on load balancing. He said he's hardcoded things and that there are no API. It might be difficult to integrate it directly to resiliency stuff but lets see. I have a call with him tomorrow where he'll explain everything. If things work out, we'll have an optimal version of distributed resiliency.
<hkaiser> nikunj97: what's your plan forward? what are the next steps?
<hkaiser> nikunj97: ok
<nikunj97> so for now, my plan is to start on async replay using colocated policy
<hkaiser> so adding the executor argument?
<hkaiser> or adding actions?
<nikunj97> in the beginning I plan to forget about load balancing and randomly assigning nodes
<nikunj97> using actions
<hkaiser> ok
<nikunj97> actions does take in an execution policy right? that's where I'll do colocated.
<hkaiser> I'll add the executors, then
<nikunj97> colocated already works fine with actions btw
<nikunj97> its only binpacking where you missed, I believe
<hkaiser> as said, binpacking doesn't make sense for execution
<nikunj97> that's true. but it will be really nice if there is an executor which can take a hpx locality as argument and return neighboring localities
<nikunj97> this way we can do replications with node affinity
<hkaiser> sure
<hkaiser> btw, in your changes you renamed the functions
<nikunj97> yes
<hkaiser> async_replay has two overloads
<nikunj97> to have a single entry point
<hkaiser> looks like those are ambiguous now
<nikunj97> the 2nd version should work directly
<hkaiser> hows that?
<nikunj97> given the first parameter can't be invoked with the arguments
<hkaiser> where do you do the sfinae?
<nikunj97> so that's enough information for compiler to know which function to call
<nikunj97> for async_replay, I don't. It just works directly.
<hkaiser> I doubt that
<nikunj97> so it doesn't look ambiguous to me. Will the compiler not pick it up?
<hkaiser> only if F does not expect any arguments itself
<nikunj97> it seemed to pass all tests.
<hkaiser> we had different names for a reason, I believe
<nikunj97> I don't get it. It should work. https://github.com/STEllAR-GROUP/hpx/blob/master/libs/resiliency/include/hpx/resiliency/async_replay.hpp#L191 it's the 2nd argument to the function and invoke is used with that template parameter.
<nikunj97> for replay_validate, its used as a 3rd argument to the function, invalidating the previous function overload
<hkaiser> how does the compiler know which overload to chose if you invoke async_replay with 4 arguments?
<hkaiser> it could be either (n, Pred, F, arg0) of (n, F, arg0, arg1)
<nikunj97> yes, and that's done by checking: hpx::util::detail::invoke_deferred_result<F, Ts...>::type>
<nikunj97> Pred is not invocable with Ts...
<nikunj97> so if the 2nd argument is a predicate, hpx::util::detail::invoke_deferred_result<F, Ts...>::type> will not work for the 2nd overload
<hkaiser> so you're saying the missing ::type in the invoke_deferred triggers sfinae?
<nikunj97> yes
<hkaiser> that would be surprising
<K-ballo> it does
<K-ballo> same as invoke_result
<hkaiser> ok
<hkaiser> I wouldn't have expected that
<nikunj97> that's why I didn't add anything else
<K-ballo> I seem to recall I didn't like this approach when I first saw it, am I remembering correctly?
<hkaiser> idk, I was not part of this discussion
<nikunj97> K-ballo, did we have this conversation?
<K-ballo> I remember this borderline ambiguous case
<hkaiser> I'd have used a always_void<> or similar
<K-ballo> it was a while ago... weeks?
<nikunj97> iirc, it was with async_replicate. I don't rely on invoke_deferred there. I use enable_if and is_invocable there.
<K-ballo> yes, june 9
<nikunj97> much cleaner than the one I implemented last time
<hkaiser> why did you do it differently, then?
<K-ballo> I'm not keen on overloads differing by invocable in general
<nikunj97> because there invoke_deferred_result is not enough. Compiler needs to differentiate Pred and Vote. Both are non invocable with Ts... so that becomes ambiguous.
<nikunj97> hkaiser ^^
<hkaiser> ok
<nikunj97> hkaiser, what do you suggest? Should I add in something more specific to make the choice clearer to the compiler (something I've done with replicate APIs)?
<hkaiser> couldn't you constrain Pred to bool(invoke_result_t<F, Ts...>)?
<nikunj97> sure I can
<hkaiser> same for the voting function, that would get rid of a lot of those enable_ifs
<nikunj97> and that will make sure that Pred has a bool return type. Something that will help the compiler to disregard functions with non boolean return types.
<hkaiser> hmm, might not work because of argument ordering, though
<nikunj97> I found enable_ifs as the cleanest approach. I'm a novice though. If you have a better approach, I'll change it accordingly.
<hkaiser> but you could add a fake template argument typename FResult = invoke_result_t<F, Ts...> enable_if's (explicit sfinae in general) tends to be very brittle
<hkaiser> sorry messed it up
<hkaiser> I meant: enable_if's (explicit sfinae in general) tends to be very brittle
<nikunj97> that will not help in case of replicate
<nikunj97> the return type for Vote and Pred can be same (i.e. bool)
<K-ballo> consider using different names
<nikunj97> K-ballo, we had different names to begin with. I thought making a single entry point will make it easier for the user.
<hkaiser> K-ballo: yah, that's what we had initially
<nikunj97> hkaiser, having different names eases my process exponentially. So I'm always up for it :P
<hkaiser> :D
<nikunj97> but I do feel that a single entry point is a much better approach
<hkaiser> so we're in agreement
<nikunj97> that's why I tried to make it single entry API
<nikunj97> hkaiser, what do you suggest then?
<hkaiser> I'm fine with having different names
<nikunj97> great :D
<nikunj97> so let me move the current APIs to another namespace, say local?
<nikunj97> so we can have distributed APIs directly exposed. And to use on node resilience, use hpxr::local::<api name>
<hkaiser> nikunj97: I think we settled on having the local stuff in namespace hpx or hpx::experimental, the distributed stuff in hpx::distributed or hpx::distriuted::experimental
<hkaiser> but in this case we can do the same as we've done with async, just overload the name for both
<nikunj97> overload the name for both as in?
<hkaiser> async(f, ts...), async<Action>(id, ts...) and async(Action{}, id, Ts...) are non-ambigous
<nikunj97> so you're telling me if I keep async_replay(f, ts...) and async_replay(Action{}, id, ts...). They will work as non-ambiguous versions?
<hkaiser> they should, yes
<nikunj97> is invoke_deferred_result<Action, Ts...>::type not return the return type?
<nikunj97> because that is the only differentiating factor as I can see between these 2 versions.
<nikunj97> s/as/~as~
<hkaiser> there is traits::is_action<Action>
<nikunj97> so I'll have to add sfinae on these overloads to differentiate them
<nikunj97> using traits::is_action<Action>
<K-ballo> what do the overloads do differently?
<nikunj97> K-ballo, one overload is meant for local and the other is meant for distributed
<K-ballo> how do they differ in arguments?
<nikunj97> I think I can make things work without adding an additional overload for distributed. I can add if (hpx::traits::is_action<Action>) here: https://github.com/STEllAR-GROUP/hpx/blob/master/libs/resiliency/include/hpx/resiliency/async_replay.hpp#L88
<nikunj97> K-ballo, they don't. That's why I was asking how is it not ambiguous
<K-ballo> uh? that's impussible
<K-ballo> what are the expected interfaces for those two functions, from the user standpoint?
<nikunj97> well one takes in action, id, ts... and the other is f, ts...
<K-ballo> so they do differ
<hkaiser> K-ballo: we do it for async, shy shouldn't it be possible here?
<K-ballo> it's impossible that they do not differ in arguments
<nikunj97> ok wait. It will work. I think I got it wrong.
<K-ballo> action, id vs f seems rather simple to distinguish on, what's the problem?
<nikunj97> the arguments differ, and hpx::id_type goes in as a separate argument. So f, ts... won't be invocable with it
<nikunj97> so it should work.
<K-ballo> if is action do one, else do the other
<K-ballo> you don't need to sfinae on anything more than that, do you?
<nikunj97> right, I got confused for no reason
<K-ballo> and the less you sfinae, the better the user experience when they get it wrong
<K-ballo> to the point you'd actually want to avoid introducing sfinae when possible
<nikunj97> hkaiser, let me start with async replay. Shouldn't take time to add a working prototype. I'll add a PR by tomorrow.
<K-ballo> (meaning there's no need and ideally no desire either to sfinae on invocability)
nikunj has quit [Read error: Connection reset by peer]
nikunj has joined #ste||ar
nikunj97 has quit [Read error: Connection reset by peer]
nikunj97 has joined #ste||ar