#ste||ar on 2022-01-10 — irc logs at irclog.cct.lsu.edu

2021-08-06 22:55 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu

01:32 <gnikunj[m]> hkaiser: could you point me to the implementation of sender receiver stuff in HPX pls

01:33 <hkaiser> gnikunj[m]: currently here: https://github.com/STEllAR-GROUP/hpx-local/tree/master/libs/core/execution/include/hpx/execution/algorithms

01:33 <gnikunj[m]> Thanks!

01:34 <hkaiser> look at the tests starting with algorithm_ here: https://github.com/STEllAR-GROUP/hpx-local/tree/master/libs/core/execution/tests/unit

01:34 <hkaiser> or simply P2300

01:35 <gnikunj[m]> Nice, thanks! I was brushing up on sender receiver stuff so thought of checking HPX’s implementation

01:36 <hkaiser> nod, makes sense

01:36 <hkaiser> there is also libunifex

01:36 <hkaiser> https://github.com/facebookexperimental/libunifex

01:41 <gnikunj[m]> Btw where is the actual sender class defined? I’m seeing sender derivatives that store a sender type and build upon it (bulk_sender et al) but not the base sender class

01:42 <hkaiser> look at the schedulers

01:42 <hkaiser> also just is a sender

01:42 <gnikunj[m]> Ohh ok

01:43 <hkaiser> gnikunj[m]: as a larger example look at https://github.com/STEllAR-GROUP/hpx-local/blob/master/libs/core/executors/include/hpx/executors/scheduler_executor.hpp

01:43 <hkaiser> it's an old-fashioned HPX executor on top of s/r

01:43 <gnikunj[m]> Thanks!!

01:44 <hkaiser> a bit difficult to read as we deliberately didn't use the pipe syntax, so you have to read it inside-out

01:44 <gnikunj[m]> Na that’s fine. I (idk if that’s just me) prefer not to use pipe in general

01:45 <hkaiser> for s/r its really more convenient and more readable

01:46 <hkaiser> compare start_detached(then(schedule(sched), [](...){})) with schedule(sched) | then([](...){}) | start_detached

01:46 <gnikunj[m]> Yeah, you’re right. I guess it’s more about me not getting used to the pipe syntax as a bottleneck here 😅

01:48 <hkaiser> gnikunj[m]: btw, Sanjay told me about charm/light and he wants to use it for the task_bench paper

01:48 <hkaiser> also, for the stencil benchmark in taskbench we now beat charm ;-)

01:49 <gnikunj[m]> Yeah, Simeng came in our little charmlite den. We don’t think that’s possible (we lack plenty of features rn) but she said she can implement without sections (charm++’s way of handling global data structures)

01:49 <hkaiser> nice

01:49 <gnikunj[m]> hkaiser: Now that’s incredible!! What led to the increase in performance?

01:49 <hkaiser> gnikunj[m]: btw, will you be able to join the hpx call on Thursday?

01:50 <hkaiser> gnikunj[m]: careful analysis, mostly

01:50 <gnikunj[m]> Yes, I will be joining the call this Thursday

01:50 <hkaiser> we are better for the single node case, comm is still bad

01:50 <gnikunj[m]> I mean I’ll be in BTR starting Tuesday so I expect Giannis to wake me up :P

01:50 <hkaiser> ok, good

01:50 <hkaiser> I want him to join as well

01:50 <gnikunj[m]> hkaiser: Yeah, we need to figure things out there.

02:01 <hkaiser> gnikunj[m]: see pm, pls

02:35 hkaiser has quit [Quit: Bye!]

03:28 jehelset has quit [Ping timeout: 240 seconds]

05:37 jehelset has joined #ste||ar

07:39 jehelset has quit [Ping timeout: 240 seconds]

08:12 K-ballo has quit [*.net *.split]

08:12 diehlpk_work_ has quit [*.net *.split]

08:12 akheir has quit [*.net *.split]

08:12 wash_ has quit [*.net *.split]

08:24 K-ballo has joined #ste||ar

08:24 diehlpk_work_ has joined #ste||ar

08:24 akheir has joined #ste||ar

08:24 wash_ has joined #ste||ar

09:49 jehelset has joined #ste||ar

11:50 jehelset has quit [Ping timeout: 240 seconds]

12:37 hkaiser has joined #ste||ar

13:58 akheir has quit [Quit: Leaving]

14:14 jehelset has joined #ste||ar

14:19 hkaiser has quit [Quit: Bye!]

15:53 K-ballo has quit [Ping timeout: 256 seconds]

16:31 <srinivasyadav227> hi, can any one please help me with this error https://gist.github.com/srinivasyadav18/917d43b8025c68107bd19a3f5ace7129. I am trying to build hpx on ARM machine (A64FX) I have also added `-DHPXLocal_WITH_GENERIC_CONTEXT_COROUTINES=ON` flag to cmake options

16:31 <diehlpk_work_> srinivasyadav227, On Ookami?

16:32 <srinivasyadav227> diehlpk_work_: yes

16:32 <diehlpk_work_> gdaiss[m], and I could build there this weekend

16:32 <diehlpk_work_> Which compiler are you suing?

16:32 <diehlpk_work_> *using?

16:32 <srinivasyadav227> gcc/11.1

16:32 <srinivasyadav227> is there any extra/special flag i need to pass ?

16:32 <diehlpk_work_> We used armclang++

16:33 <diehlpk_work_> armclang

16:33 <diehlpk_work_> Are you sure the compiler is build for arm64fx?

16:34 <diehlpk_work_> For exaple if you laod cmake the binary is not built for arm

16:35 <srinivasyadav227> diehlpk_work_: oh, i think i missing this out, I used `module load gcc/11.1.0` from the available modules in Ookami

16:36 <diehlpk_work_> srinivasyadav227,

16:36 <diehlpk_work_> https://www.stonybrook.edu/commcms/ookami/support/faq/ookami-fujitsu-compilers

16:36 <diehlpk_work_> That are the official compilers from the compnay

16:37 <diehlpk_work_> and that are the clang compilers

16:37 <diehlpk_work_> https://www.stonybrook.edu/commcms/ookami/support/faq/ookami-arm-compilers

16:37 <diehlpk_work_> I used the second option to build hpx and all dependencies

16:37 <diehlpk_work_> They provide you with the openmpi compiled with these compielrs as well

16:38 <diehlpk_work_> As far as I understood, you need to use 0ne of these comopilers

16:38 <srinivasyadav227> i need to use gcc/11.1 or greater for running SIMD benchmarks with hpx

16:39 <diehlpk_work_> Ok, I think you can use gcc as well

16:39 <diehlpk_work_> https://www.stonybrook.edu/commcms/ookami/support/faq/ookami-gcc-compilers

16:39 <diehlpk_work_> Ok, maybe we have some bug in the code or do nto check for the correct env

16:39 <diehlpk_work_> At least I can confirm that armclang was working for me

16:40 <gnikunj[m]> srinivasyadav227: the errors are correct. Those flags are all based on x86: https://github.com/STEllAR-GROUP/hpx-local/blob/master/libs/core/hardware/include/hpx/hardware/cpuid.hpp#L11-L16

16:41 <gnikunj[m]> s/x86/non-Arm/

16:41 <diehlpk_work_> Can you run

16:41 <diehlpk_work_> g++ -dM -E -x c++ - < /dev/null

16:41 <diehlpk_work_> and check if the defines are exportet by the compiler

16:42 <diehlpk_work_> This would be a good check to see if we have arm or aarch or whatever

16:42 <diehlpk_work_> I need to run, but will be back soon

16:44 <gnikunj[m]> diehlpk_work_: gcc works for A64FX. Fujitsu compilers are better though at generating faster executables.

16:45 <gnikunj[m]> srinivasyadav227: could you clear the cache and send me the log of a fresh build (do it on A64FX node and not on head node)

16:46 <srinivasyadav227> <diehlpk_work_> " g++ -dM -E -x c++ - < /dev/null" <- https://gist.github.com/srinivasyadav18/d0adf32807df69d1958a5443783fb907

16:47 <srinivasyadav227> gnikunj[m]: yes, i tried the build on A64FX node

16:48 <gnikunj[m]> Yeah sure, but I would like to see what flags it sets and so during cmake config

16:48 <srinivasyadav227> oh okay, one min :)

16:54 <srinivasyadav227> gnikunj: https://gist.github.com/srinivasyadav18/566ba30762fa7898f4682df9a5d86d82

16:55 <srinivasyadav227> this flag `HPXLocal_WITH_GENERIC_CONTEXT_COROUTINES` is ON, and during cmake it detected the architecture as `arm`

16:56 <gnikunj[m]> It’s not an issue relating to that.

16:56 <gnikunj[m]> It’s an error coming from cpuid

16:56 <gnikunj[m]> Which is meant to not work aarch

16:56 <gnikunj[m]> That’s what I’m investigating rn

16:58 <gnikunj[m]> Everything other than HPXLocal_WITH_MALLOC looks correct to me

16:58 <gnikunj[m]> But that’s taken care of by build system si that’s not the underlying issue either

16:59 hkaiser has joined #ste||ar

16:59 <srinivasyadav227> gnikunj[m]: yeah

17:05 <srinivasyadav227> <diehlpk_work_> "This would be a good check to..." <- https://gist.github.com/srinivasyadav18/d0adf32807df69d1958a5443783fb907#file-out-log-L197 its aarch64

17:10 <gnikunj[m]> srinivasyadav227: not sure where it's coming from really.

17:11 <srinivasyadav227> gnikunj: ok :) 😅

17:11 <gnikunj[m]> I mean ik where it's coming from (fork-join-executor) but not sure if it's related to build time

17:13 <srinivasyadav227> <gnikunj[m]> "srinivasyadav227: the errors are..." <- why there is no arm/arch64 related flag defined there ?, all are x86 based only ?

17:14 <gnikunj[m]> Because we don't support cpuids based on arm

17:15 <gnikunj[m]> those are features that are used in writing register specific things (hence the tables that stores the info)

17:16 <srinivasyadav227> oh ok, generally cpuid is only for x86 based systems ?

17:16 <srinivasyadav227> https://man7.org/linux/man-pages/man4/cpuid.4.html

17:17 <gnikunj[m]> Yeah, I think so. Although you can identify other cpu related info for Arm as well.

17:19 <srinivasyadav227> oh ok cool :)

17:24 <gnikunj[m]> hkaiser: yt?

17:50 hkaiser has quit [Quit: Bye!]

17:56 FunMiles has joined #ste||ar

18:00 <FunMiles> Is C++20 coroutines with co_await still available in HPX? How can I enable it, and it particular on MacOS?

18:27 hkaiser has joined #ste||ar

18:28 <FunMiles> @hkaiser: Are C++20 coroutines with co_await as you showed them in CPPcon still available in HPX? How can I enable them?

18:29 <FunMiles> When doing grep on the latest source, I'm not finding what I thought would have to be there for that feature to be still enabled.

18:37 <hkaiser> FunMiles: just configuring with -DHPX_CXX_STANDARD=20 should do the trick

18:43 <FunMiles> hkaiser: Trying that. But here's a quick question: How come a recursive `grep -rn promise_type *` only returns a match in `cmake/tests/cxx20_coroutines.cpp` ? Isn't it necessary to define that type with C++20 coroutines?

18:44 <gnikunj[m]> hkaiser: I went through the paper yesterday and have a couple questions.

18:44 <gnikunj[m]> 1. The connect function should take in a receiver of the same type as the sender, right? (like if you have a then sender, then you want to connect it to the then receiver)

18:44 <gnikunj[m]> 2. If the connect function returns an operation state, why does some receiver not have them? (I see that then receiver doesn't have operation_state struct defined in it)

18:44 <gnikunj[m]> 3. How do I retrieve the end value that was set by the receiver using set_value()?

18:48 <FunMiles> hkaiser: Are you making use of asio's C++ coroutine features to provide the coroutine interface?

19:02 <hkaiser> FunMiles: no asio is involved, let me give you the link to the integration, give me a sec

19:03 <hkaiser> FunMiles: here: https://github.com/STEllAR-GROUP/hpx-local/blob/master/libs/core/futures/include/hpx/futures/traits/detail/future_await_traits.hpp

19:04 <hkaiser> the promise_type is here: https://github.com/STEllAR-GROUP/hpx-local/blob/master/libs/core/futures/include/hpx/futures/traits/detail/future_await_traits.hpp#L204

19:06 <hkaiser> gnikunj[m]: 1) no, sender and receiver are independent and unrelated types

19:06 <hkaiser> you can combine any sender with any receiver

19:07 <hkaiser> 2) I'm not sure I understand, what receiver do you refer to?

19:07 <hkaiser> 3) again, not sure I understand your question

19:08 <gnikunj[m]> 2) I was talking about HPX implementation. For instance then_receiver doesn't have an operation_state struct with start function.

19:09 <gnikunj[m]> Also, for 1) why would you want a bulk_receiver take a then_sender?

19:09 <hkaiser> then is a receiver, no?

19:10 <gnikunj[m]> 3) What I meant was - how do you get the final value? (equivalent to future.get())

19:10 <gnikunj[m]> also, do you have time for a quick call? I think I'll be able to better explain them like that.

19:10 <hkaiser> it is received by the lambda you pass to then, for instance

19:11 <gnikunj[m]> sure, but what about the final lambda which also returns a value. How do I get that?

19:11 <hkaiser> for 2) there is a default operation_state that simply connects a sender with a receiver (i.e. simply passing through the values), that is used when no special functionality is needed

19:11 <gnikunj[m]> are there sender algorithms that returns it?

19:12 <hkaiser> gnikunj[m]: you can use a make_future to get a future for the overall value

19:12 <gnikunj[m]> where is the default operation_state implemented in HPX?

19:13 <hkaiser> sync_wait returns the value as well

19:13 <gnikunj[m]> hkaiser: make_future isn't in 2300R3.

19:13 <gnikunj[m]> wait, sync_wait isn't void?

19:13 <hkaiser> no it isn't, but sync_wait is

19:13 <gnikunj[m]> ohh crap, right

19:13 <gnikunj[m]> sync_wait will return the final value. Now, it makes sense.

19:14 <gnikunj[m]> ok, but I still don't understand why you can connect any receiver with any sender

19:14 <gnikunj[m]> like why would you connect a bulk receiver with a then sender?

19:14 <hkaiser> gnikunj[m]: that's the whole point of this, combining arbitrary senders with arbitrary receivers

19:14 <hkaiser> anonymous consumer/producer chains

19:15 <hkaiser> give me a sec to locate the default operation_state

19:15 <gnikunj[m]> when I thought of arbitrary connections, I though more of transfer algorithms - like transferring execution from one scheduler context to another

19:16 <hkaiser> sure, that's just a special receiver

19:16 <gnikunj[m]> but transfer is a sender algorithm that returns a sender

19:17 <gnikunj[m]> (I'm honestly a bit confused by the genericity and need clarification so I apologize if I sound absolutely stupid asking these questions)

19:17 <hkaiser> well yes, it returns a new sender that sits on the target execution environment

19:19 <hkaiser> gnikunj[m]: the base facilities are here: https://github.com/STEllAR-GROUP/hpx-local/tree/master/libs/core/execution_base/include/hpx/execution_base

19:25 <gnikunj[m]> hkaiser: btw we lack quite a few things in HPX wrt sender-receiver. Most notably the pipeline operator

19:25 <hkaiser> we have that, I'm sure

19:26 <gnikunj[m]> where is it? I didn't see any operator| in the implementation

19:26 <gnikunj[m]> is it in execution_base then?

19:27 <gnikunj[m]> ok, we do have it

19:29 <hkaiser> nope, it's here: https://github.com/STEllAR-GROUP/hpx-local/blob/master/libs/core/execution/include/hpx/execution/algorithms/detail/partial_algorithm.hpp#L41-L47

19:29 <gnikunj[m]> hkaiser: we don't have some sender adaptors as well - transfer_when_all, on, upon_*

19:29 <hkaiser> yes, that's correct

19:29 <hkaiser> those are fairly new in p2300

19:30 <gnikunj[m]> Btw, I'm traveling tomorrow morning so I won't be able to attend the meeting with Srinivas

19:30 <hkaiser> sure, np

19:30 <FunMiles> hkaiser: I am a bit confused about the file you sent. It's not in a regular cloned hpx directory. Is it a different branch? The link says it is master but then it's hpx-local

19:30 <gnikunj[m]> hkaiser: do you have some time to discuss on sender-receiver stuff on wednesday?

19:31 <hkaiser> FunMiles: long story, for now just go with it, I think hpx-local will be back in hpx soon

19:31 <gnikunj[m]> I'd like to discuss exactly on the implementation details. Meanwhile, I'll do some debug run to see the call stack of execution.

19:31 <hkaiser> gnikunj[m]: sure

19:31 <gnikunj[m]> ok, let me ask Katie to set up a time on wed

19:31 <FunMiles> hkaiser: OK :) I'll give it a try.

19:32 <hkaiser> FunMiles: sorry for the confusion

19:35 <FunMiles> hkaiser: For the flag, is it HPX_CXX_STANDARD or CXX_STANDARD ?

19:35 <FunMiles> Cmake says it's unused.

19:36 <hkaiser> sec

19:37 <hkaiser> FunMiles: for any recent versions of HPX master it should be HPX_CXX_STANDARD=20

19:38 <hkaiser> you can try HPX_WITH_CXX20=On, that should do the right thing even if it generates a warning

19:38 <FunMiles> -DHPXLocal_WITH_CXX_STANDARD=20

19:39 <hkaiser> this will changesoon, but for now may work

19:40 <hkaiser> again sorry, things are in a bit of limbo right ow

19:44 <FunMiles> hkaiser: I was watching from cppcon the presentation about nano-coroutines by Gor Nishanov. He mentions that switching bundles takes micro-seconds, while co-routines take nano-seconds. Have you measured performance benefits from using C++20 coroutines?

19:45 <hkaiser> FunMiles: we have not measure this, but using co_await with our futures doesn't have the benefit of nano-seconds overhead as Gor describes

19:45 <hkaiser> this is not really using coroutines anyways

19:46 <hkaiser> it's more syntactic sure allowing to turn future-base asynchronous continuation-based code inside out and make it easier to read and write

19:46 <hkaiser> syntactic sugar*

19:47 <hkaiser> it still has the same overhead as using futures without co_await

19:47 <FunMiles> OK. Can you imagine that in the future, you could replace fibers with coroutines if all code could use purely co_await approaches?

19:47 <FunMiles> Thus benefitting from faster switching?

19:47 <hkaiser> FunMiles: nothing prevents you from using co_await based coroutines on top of our threading (fibers)

19:47 <hkaiser> those things are orthogonal

19:50 <FunMiles> More or less, it would require that I make another futures system, no? One of my conceptual gripes with C++20 coroutines is that the behavior implicitly involves the return type... the behavior of what co_await does depends on the return type.

19:50 <hkaiser> FunMiles: no

19:50 <hkaiser> you could use light-weight awaitables as suggested by Gor and run them on any HPX thread

19:51 <FunMiles> I'm going to have to explore that approach.

19:51 <hkaiser> ok

20:00 <FunMiles> what are the CMake exported targets? HPXLocal::hpx does not exist it seems

20:02 <FunMiles> HPXLocal seems to be OK.

20:02 <FunMiles> It does not seem to define the proper include directories.

20:02 <hkaiser> HPX::hpx

20:06 <FunMiles> target not found.

20:06 <hkaiser> FunMiles: do you use hpx or hpx-local?

20:06 <hkaiser> which repository?

20:06 <FunMiles> hpx-local, since that's the one with the C++20 coroutine capable code.

20:07 <hkaiser> ok

20:07 <hkaiser> do you need the distributed functionalities of HPX?

20:07 <FunMiles> Not at this stage, no.

20:07 <hkaiser> ok

20:07 <hkaiser> the HPX::hpx_local is the target

20:08 <hkaiser> I think ;-)

20:08 <FunMiles> It is! :) Thanks

20:09 <FunMiles> Though I still have some issues.

20:09 <FunMiles> I copied the code for transpose_await.cpp and some of the includes are found while some are not.

20:10 <FunMiles> Some filenames have changed? It uses algorithm.hpp but it seems it's algorithms.hpp now with an 's'

20:12 <FunMiles> There is no hpx/hpx.hpp ?

20:24 jehelset has quit [Ping timeout: 240 seconds]

20:25 <hkaiser> FunMiles: you got in the middle of a large restructuring

20:25 <hkaiser> could you give us some time to consolidate things?

20:26 <hkaiser> for now, I'd suggest you use the hpx repository (not hpx-local)

20:26 <FunMiles> hkaiser: OK. I will wait.

20:26 <hkaiser> that should automatically pull in hpx-local

20:26 <FunMiles> The regular hpx repository does not include the co_await at the moment, right?

20:38 <hkaiser> FunMiles: since it pull hpx-local, it will

20:38 <hkaiser> pulls*

20:40 <hkaiser> FunMiles: the story is that for various reasons there was a plan to split HPX into two repositories, hpx and hpx-local

20:40 <hkaiser> for users of hpx nothing should change (the build system makes sure those are well integrated)

20:40 <hkaiser> but for even more various reasons, this split (which happened early December) might be reverted soon

20:41 <hkaiser> that's what I meant, that things are kindof in limbo and we need another 3-4 weeks to get back to normal

22:31 jehelset has joined #ste||ar