#ste||ar on 2020-07-21 — irc logs at irclog.cct.lsu.edu

2020-02-24 20:46 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020

00:12 akheir has quit [Quit: Leaving]

00:19 diehlpk__ has quit [Remote host closed the connection]

00:19 diehlpk__ has joined #ste||ar

00:22 diehlpk__ has quit [Remote host closed the connection]

00:22 diehlpk__ has joined #ste||ar

00:25 bita_ has joined #ste||ar

00:34 diehlpk__ has quit [Remote host closed the connection]

00:34 diehlpk__ has joined #ste||ar

00:36 diehlpk__ has quit [Remote host closed the connection]

00:37 diehlpk__ has joined #ste||ar

01:14 nikunj97 has quit [Read error: Connection reset by peer]

01:15 diehlpk__ has quit [Ping timeout: 260 seconds]

01:45 bita_ has quit [Ping timeout: 260 seconds]

01:47 hkaiser_ has quit [Quit: bye]

02:57 nanmiao11 has quit [Remote host closed the connection]

04:52 weilewei has quit [Remote host closed the connection]

08:15 Yorlik has quit [Ping timeout: 240 seconds]

08:18 Yorlik has joined #ste||ar

09:15 diehlpk_work has quit [Remote host closed the connection]

09:16 diehlpk_work has joined #ste||ar

10:46 Yorlik has quit [Ping timeout: 240 seconds]

11:18 Yorlik has joined #ste||ar

11:45 hkaiser has joined #ste||ar

12:48 nikunj97 has joined #ste||ar

13:18 nikunj97 has quit [Quit: Leaving]

13:20 nikunj97 has joined #ste||ar

13:23 Nikunj__ has joined #ste||ar

13:26 nikunj97 has quit [Ping timeout: 260 seconds]

14:20 nanmiao11 has joined #ste||ar

14:39 weilewei has joined #ste||ar

15:00 diehlpk__ has joined #ste||ar

15:00 <diehlpk_work> hkaiser, Meeting?

15:58 diehlpk__ has quit [Ping timeout: 260 seconds]

16:18 diehlpk__ has joined #ste||ar

16:23 diehlpk__ has quit [Ping timeout: 260 seconds]

16:29 Nikunj__ has quit [Read error: Connection reset by peer]

16:42 diehlpk_work has quit [Ping timeout: 260 seconds]

16:45 <nanmiao11> hkaiser meeting?

17:10 nikunj97 has joined #ste||ar

17:11 bita_ has joined #ste||ar

17:19 Nikunj__ has joined #ste||ar

17:22 nikunj97 has quit [Ping timeout: 260 seconds]

17:39 diehlpk_work has joined #ste||ar

17:54 nikunj has quit [Ping timeout: 256 seconds]

17:54 nikunj has joined #ste||ar

19:22 oj has joined #ste||ar

19:31 diehlpk__ has joined #ste||ar

19:41 diehlpk__ has quit [Ping timeout: 260 seconds]

19:48 oj has quit [Quit: leaving]

20:17 <weilewei> I wonder what reason will cause this error: ../libs/synchronization/src/mutex.cpp:39: void hpx::lcos::local::mutex::lock(const char*, hpx::error_code&): Assertion 'threads::get_self_ptr() != nullptr' failedAborted

20:17 <weilewei> is hpx runtime not started correctly?

20:20 <K-ballo> or code not running on an hpx thread

20:26 <weilewei> How do I make sure it is running on an hpx thread? Is adding #include <hpx/hpx_main.hpp> sufficient enough?

20:28 <zao> The wrapper will ensure that the body of the user-supplied main is running in a HPX thread, but still have to be careful in globals and other pre/post-main objects.

20:32 <weilewei> @zao ok, thanks

20:33 <zao> (it macros the existing main to a regular function and sneaks in an actual main via a helper library that initializes HPX and calls the user "main" as a task)

20:33 <zao> Note that if you use it, you need to explicitly return from your "main", as the rules about implicit return only holds for actual main.

20:38 <weilewei> yea, I do have a return result; thing in my main

20:38 <weilewei> that's good to know about this

20:48 Nikunj__ is now known as nikunj97

20:49 <nikunj97> weilewei, I tried to wrap global objects on HPX threads as well. But it required changes in the compiler toolchain, something hkaiser asked me to stay away from.

20:50 <nikunj97> so a main wrapper to get work done is what we have in HPX as of now.

20:50 <weilewei> nikunj97 ok I see

20:51 <weilewei> I think I spotted the problem, the app is not using hpx thread, but it tries to access hpx thread id, which leads to fail

20:53 <nikunj97> if you're calling that function from something running on hpx thread, it should run on hpx thread as well

20:53 <nikunj97> unless of course, you are trying to call it from a global scope

21:10 <hkaiser> nikunj97: hey, yt?

21:10 <nikunj97> hkaiser, here

21:10 <hkaiser> you said you had worked on the resiliency stuff by basing it on the current HPX version of the code

21:11 <nikunj97> yes

21:11 <hkaiser> it doesn't look like that

21:11 <nikunj97> what how?

21:11 <nikunj97> I did change the directory structure if that's what you mean

21:11 <nikunj97> but the base code is the same as the one in HPX

21:11 <hkaiser> not for me

21:12 <hkaiser> I have checked out this: git@github.com:STEllAR-GROUP/resilience.git

21:12 <hkaiser> master branch

21:12 <hkaiser> is that the correct code?

21:12 <nikunj97> yes.

21:12 <nikunj97> ohh wait damn. You're right it is not HPX. I copied the resiliency repo.

21:12 <nikunj97> the one from last year.

21:12 <nikunj97> I should change that.

21:12 <hkaiser> so it's one year old code :/

21:13 <hkaiser> yes, please

21:13 <nikunj97> there has been code changes since?

21:13 <hkaiser> Keita asked me to start looking into adapting our code to Kokkos

21:13 <hkaiser> compare the files to see for yourself

21:15 <hkaiser> nikunj97: I'm likely going to change things in HPX - I strongly suggest you work against the HPX code base as well

21:15 <nikunj97> so work as a PR on HPX?

21:15 <hkaiser> that will allow you to rebase etc

21:15 <hkaiser> yes

21:16 <nikunj97> makes sense. Let's kill this separate repo. I'll push changes directly on HPX.

21:16 <hkaiser> thanks!

21:17 <nikunj97> btw, I talked to parsa about his work on load balancing. He said he's hardcoded things and that there are no API. It might be difficult to integrate it directly to resiliency stuff but lets see. I have a call with him tomorrow where he'll explain everything. If things work out, we'll have an optimal version of distributed resiliency.

21:17 <hkaiser> nikunj97: what's your plan forward? what are the next steps?

21:17 <hkaiser> nikunj97: ok

21:18 <nikunj97> so for now, my plan is to start on async replay using colocated policy

21:18 <hkaiser> so adding the executor argument?

21:18 <hkaiser> or adding actions?

21:18 <nikunj97> in the beginning I plan to forget about load balancing and randomly assigning nodes

21:18 <nikunj97> using actions

21:19 <hkaiser> ok

21:19 <nikunj97> actions does take in an execution policy right? that's where I'll do colocated.

21:19 <hkaiser> I'll add the executors, then

21:19 <nikunj97> colocated already works fine with actions btw

21:19 <nikunj97> its only binpacking where you missed, I believe

21:19 <hkaiser> as said, binpacking doesn't make sense for execution

21:20 <nikunj97> that's true. but it will be really nice if there is an executor which can take a hpx locality as argument and return neighboring localities

21:21 <nikunj97> this way we can do replications with node affinity

21:21 <hkaiser> sure

21:22 <hkaiser> btw, in your changes you renamed the functions

21:22 <nikunj97> yes

21:22 <hkaiser> async_replay has two overloads

21:22 <nikunj97> to have a single entry point

21:22 <hkaiser> looks like those are ambiguous now

21:22 <nikunj97> the 2nd version should work directly

21:22 <hkaiser> hows that?

21:22 <nikunj97> given the first parameter can't be invoked with the arguments

21:23 <hkaiser> where do you do the sfinae?

21:23 <nikunj97> so that's enough information for compiler to know which function to call

21:23 <nikunj97> for async_replay, I don't. It just works directly.

21:23 <hkaiser> I doubt that

21:24 <nikunj97> https://github.com/STEllAR-GROUP/hpx/blob/master/libs/resiliency/include/hpx/resiliency/async_replay.hpp#L171 isn't this good enough already?

21:24 <nikunj97> https://github.com/STEllAR-GROUP/hpx/blob/master/libs/resiliency/include/hpx/resiliency/async_replay.hpp#L173 this clears F as a 3rd argument

21:25 <nikunj97> so it doesn't look ambiguous to me. Will the compiler not pick it up?

21:25 <hkaiser> only if F does not expect any arguments itself

21:25 <nikunj97> it seemed to pass all tests.

21:26 <hkaiser> we had different names for a reason, I believe

21:27 <nikunj97> I don't get it. It should work. https://github.com/STEllAR-GROUP/hpx/blob/master/libs/resiliency/include/hpx/resiliency/async_replay.hpp#L191 it's the 2nd argument to the function and invoke is used with that template parameter.

21:27 <nikunj97> for replay_validate, its used as a 3rd argument to the function, invalidating the previous function overload

21:27 <nikunj97> the key being https://github.com/STEllAR-GROUP/hpx/blob/master/libs/resiliency/include/hpx/resiliency/async_replay.hpp#L189

21:27 <hkaiser> how does the compiler know which overload to chose if you invoke async_replay with 4 arguments?

21:28 <hkaiser> it could be either (n, Pred, F, arg0) of (n, F, arg0, arg1)

21:28 <nikunj97> yes, and that's done by checking: hpx::util::detail::invoke_deferred_result<F, Ts...>::type>

21:29 <nikunj97> Pred is not invocable with Ts...

21:29 <nikunj97> so if the 2nd argument is a predicate, hpx::util::detail::invoke_deferred_result<F, Ts...>::type> will not work for the 2nd overload

21:29 <hkaiser> so you're saying the missing ::type in the invoke_deferred triggers sfinae?

21:29 <nikunj97> yes

21:29 <hkaiser> that would be surprising

21:29 <K-ballo> it does

21:30 <K-ballo> same as invoke_result

21:30 <hkaiser> ok

21:30 <hkaiser> I wouldn't have expected that

21:30 <nikunj97> that's why I didn't add anything else

21:30 <K-ballo> I seem to recall I didn't like this approach when I first saw it, am I remembering correctly?

21:31 <hkaiser> idk, I was not part of this discussion

21:31 <nikunj97> K-ballo, did we have this conversation?

21:32 <K-ballo> I remember this borderline ambiguous case

21:32 <hkaiser> I'd have used a always_void<> or similar

21:32 <K-ballo> it was a while ago... weeks?

21:33 <nikunj97> iirc, it was with async_replicate. I don't rely on invoke_deferred there. I use enable_if and is_invocable there.

21:33 <K-ballo> yes, june 9

21:33 <nikunj97> much cleaner than the one I implemented last time

21:34 <hkaiser> why did you do it differently, then?

21:34 <K-ballo> I'm not keen on overloads differing by invocable in general

21:35 <nikunj97> because there invoke_deferred_result is not enough. Compiler needs to differentiate Pred and Vote. Both are non invocable with Ts... so that becomes ambiguous.

21:35 <nikunj97> hkaiser ^^

21:36 <hkaiser> ok

21:36 <nikunj97> hkaiser, what do you suggest? Should I add in something more specific to make the choice clearer to the compiler (something I've done with replicate APIs)?

21:38 <hkaiser> couldn't you constrain Pred to bool(invoke_result_t<F, Ts...>)?

21:38 <nikunj97> sure I can

21:38 <hkaiser> same for the voting function, that would get rid of a lot of those enable_ifs

21:38 <nikunj97> and that will make sure that Pred has a bool return type. Something that will help the compiler to disregard functions with non boolean return types.

21:39 <hkaiser> hmm, might not work because of argument ordering, though

21:39 <nikunj97> I found enable_ifs as the cleanest approach. I'm a novice though. If you have a better approach, I'll change it accordingly.

21:40 <hkaiser> but you could add a fake template argument typename FResult = invoke_result_t<F, Ts...> enable_if's (explicit sfinae in general) tends to be very brittle

21:41 <hkaiser> sorry messed it up

21:41 <hkaiser> I meant: enable_if's (explicit sfinae in general) tends to be very brittle

21:41 <nikunj97> that will not help in case of replicate

21:42 <nikunj97> the return type for Vote and Pred can be same (i.e. bool)

21:42 <K-ballo> consider using different names

21:43 <nikunj97> K-ballo, we had different names to begin with. I thought making a single entry point will make it easier for the user.

21:44 <hkaiser> K-ballo: yah, that's what we had initially

21:44 <nikunj97> hkaiser, having different names eases my process exponentially. So I'm always up for it :P

21:44 <hkaiser> :D

21:44 <nikunj97> but I do feel that a single entry point is a much better approach

21:44 <hkaiser> so we're in agreement

21:45 <nikunj97> that's why I tried to make it single entry API

21:45 <nikunj97> hkaiser, what do you suggest then?

21:45 <hkaiser> I'm fine with having different names

21:46 <nikunj97> great :D

21:46 <nikunj97> so let me move the current APIs to another namespace, say local?

21:46 <nikunj97> so we can have distributed APIs directly exposed. And to use on node resilience, use hpxr::local::<api name>

21:47 <hkaiser> nikunj97: I think we settled on having the local stuff in namespace hpx or hpx::experimental, the distributed stuff in hpx::distributed or hpx::distriuted::experimental

21:47 <hkaiser> but in this case we can do the same as we've done with async, just overload the name for both

21:48 <nikunj97> overload the name for both as in?

21:49 <hkaiser> async(f, ts...), async<Action>(id, ts...) and async(Action{}, id, Ts...) are non-ambigous

21:50 <nikunj97> so you're telling me if I keep async_replay(f, ts...) and async_replay(Action{}, id, ts...). They will work as non-ambiguous versions?

21:50 <hkaiser> they should, yes

21:51 <nikunj97> is invoke_deferred_result<Action, Ts...>::type not return the return type?

21:51 <nikunj97> because that is the only differentiating factor as I can see between these 2 versions.

21:51 <nikunj97> s/as/~as~

21:52 <hkaiser> there is traits::is_action<Action>

21:52 <nikunj97> so I'll have to add sfinae on these overloads to differentiate them

21:52 <nikunj97> using traits::is_action<Action>

21:53 <K-ballo> what do the overloads do differently?

21:53 <nikunj97> K-ballo, one overload is meant for local and the other is meant for distributed

21:54 <K-ballo> how do they differ in arguments?

21:54 <nikunj97> I think I can make things work without adding an additional overload for distributed. I can add if (hpx::traits::is_action<Action>) here: https://github.com/STEllAR-GROUP/hpx/blob/master/libs/resiliency/include/hpx/resiliency/async_replay.hpp#L88

21:54 <nikunj97> K-ballo, they don't. That's why I was asking how is it not ambiguous

21:55 <K-ballo> uh? that's impussible

21:55 <K-ballo> what are the expected interfaces for those two functions, from the user standpoint?

21:55 <nikunj97> well one takes in action, id, ts... and the other is f, ts...

21:55 <K-ballo> so they do differ

21:55 <hkaiser> K-ballo: we do it for async, shy shouldn't it be possible here?

21:55 <K-ballo> it's impossible that they do not differ in arguments

21:56 <nikunj97> ok wait. It will work. I think I got it wrong.

21:56 <K-ballo> action, id vs f seems rather simple to distinguish on, what's the problem?

21:56 <nikunj97> the arguments differ, and hpx::id_type goes in as a separate argument. So f, ts... won't be invocable with it

21:56 <nikunj97> so it should work.

21:56 <K-ballo> if is action do one, else do the other

21:56 <K-ballo> you don't need to sfinae on anything more than that, do you?

21:56 <nikunj97> right, I got confused for no reason

21:57 <K-ballo> and the less you sfinae, the better the user experience when they get it wrong

21:57 <K-ballo> to the point you'd actually want to avoid introducing sfinae when possible

21:59 <nikunj97> hkaiser, let me start with async replay. Shouldn't take time to add a working prototype. I'll add a PR by tomorrow.

21:59 <K-ballo> (meaning there's no need and ideally no desire either to sfinae on invocability)

22:19 nikunj has quit [Read error: Connection reset by peer]

22:20 nikunj has joined #ste||ar

23:27 nikunj97 has quit [Read error: Connection reset by peer]

23:41 nikunj97 has joined #ste||ar