hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu
<gnikunj[m]> hkaiser: could you point me to the implementation of sender receiver stuff in HPX pls
<gnikunj[m]> Thanks!
<hkaiser> look at the tests starting with algorithm_ here: https://github.com/STEllAR-GROUP/hpx-local/tree/master/libs/core/execution/tests/unit
<hkaiser> or simply P2300
<gnikunj[m]> Nice, thanks! I was brushing up on sender receiver stuff so thought of checking HPX’s implementation
<hkaiser> nod, makes sense
<hkaiser> there is also libunifex
<gnikunj[m]> Btw where is the actual sender class defined? I’m seeing sender derivatives that store a sender type and build upon it (bulk_sender et al) but not the base sender class
<hkaiser> look at the schedulers
<hkaiser> also just is a sender
<gnikunj[m]> Ohh ok
<hkaiser> it's an old-fashioned HPX executor on top of s/r
<gnikunj[m]> Thanks!!
<hkaiser> a bit difficult to read as we deliberately didn't use the pipe syntax, so you have to read it inside-out
<gnikunj[m]> Na that’s fine. I (idk if that’s just me) prefer not to use pipe in general
<hkaiser> for s/r its really more convenient and more readable
<hkaiser> compare start_detached(then(schedule(sched), [](...){})) with schedule(sched) | then([](...){}) | start_detached
<gnikunj[m]> Yeah, you’re right. I guess it’s more about me not getting used to the pipe syntax as a bottleneck here 😅
<hkaiser> gnikunj[m]: btw, Sanjay told me about charm/light and he wants to use it for the task_bench paper
<hkaiser> also, for the stencil benchmark in taskbench we now beat charm ;-)
<gnikunj[m]> Yeah, Simeng came in our little charmlite den. We don’t think that’s possible (we lack plenty of features rn) but she said she can implement without sections (charm++’s way of handling global data structures)
<hkaiser> nice
<gnikunj[m]> hkaiser: Now that’s incredible!! What led to the increase in performance?
<hkaiser> gnikunj[m]: btw, will you be able to join the hpx call on Thursday?
<hkaiser> gnikunj[m]: careful analysis, mostly
<gnikunj[m]> Yes, I will be joining the call this Thursday
<hkaiser> we are better for the single node case, comm is still bad
<gnikunj[m]> I mean I’ll be in BTR starting Tuesday so I expect Giannis to wake me up :P
<hkaiser> ok, good
<hkaiser> I want him to join as well
<gnikunj[m]> hkaiser: Yeah, we need to figure things out there.
<hkaiser> gnikunj[m]: see pm, pls
hkaiser has quit [Quit: Bye!]
jehelset has quit [Ping timeout: 240 seconds]
jehelset has joined #ste||ar
jehelset has quit [Ping timeout: 240 seconds]
K-ballo has quit [*.net *.split]
diehlpk_work_ has quit [*.net *.split]
akheir has quit [*.net *.split]
wash_ has quit [*.net *.split]
K-ballo has joined #ste||ar
diehlpk_work_ has joined #ste||ar
akheir has joined #ste||ar
wash_ has joined #ste||ar
jehelset has joined #ste||ar
jehelset has quit [Ping timeout: 240 seconds]
hkaiser has joined #ste||ar
akheir has quit [Quit: Leaving]
jehelset has joined #ste||ar
hkaiser has quit [Quit: Bye!]
K-ballo has quit [Ping timeout: 256 seconds]
<srinivasyadav227> hi, can any one please help me with this error https://gist.github.com/srinivasyadav18/917d43b8025c68107bd19a3f5ace7129. I am trying to build hpx on ARM machine (A64FX) I have also added `-DHPXLocal_WITH_GENERIC_CONTEXT_COROUTINES=ON` flag to cmake options
<diehlpk_work_> srinivasyadav227, On Ookami?
<srinivasyadav227> diehlpk_work_: yes
<diehlpk_work_> gdaiss[m], and I could build there this weekend
<diehlpk_work_> Which compiler are you suing?
<diehlpk_work_> *using?
<srinivasyadav227> gcc/11.1
<srinivasyadav227> is there any extra/special flag i need to pass ?
<diehlpk_work_> We used armclang++
<diehlpk_work_> armclang
<diehlpk_work_> Are you sure the compiler is build for arm64fx?
<diehlpk_work_> For exaple if you laod cmake the binary is not built for arm
<srinivasyadav227> diehlpk_work_: oh, i think i missing this out, I used `module load gcc/11.1.0` from the available modules in Ookami
<diehlpk_work_> srinivasyadav227,
<diehlpk_work_> That are the official compilers from the compnay
<diehlpk_work_> and that are the clang compilers
<diehlpk_work_> I used the second option to build hpx and all dependencies
<diehlpk_work_> They provide you with the openmpi compiled with these compielrs as well
<diehlpk_work_> As far as I understood, you need to use 0ne of these comopilers
<srinivasyadav227> i need to use gcc/11.1 or greater for running SIMD benchmarks with hpx
<diehlpk_work_> Ok, I think you can use gcc as well
<diehlpk_work_> Ok, maybe we have some bug in the code or do nto check for the correct env
<diehlpk_work_> At least I can confirm that armclang was working for me
<gnikunj[m]> srinivasyadav227: the errors are correct. Those flags are all based on x86: https://github.com/STEllAR-GROUP/hpx-local/blob/master/libs/core/hardware/include/hpx/hardware/cpuid.hpp#L11-L16
<gnikunj[m]> s/x86/non-Arm/
<diehlpk_work_> Can you run
<diehlpk_work_> g++ -dM -E -x c++ - < /dev/null
<diehlpk_work_> and check if the defines are exportet by the compiler
<diehlpk_work_> This would be a good check to see if we have arm or aarch or whatever
<diehlpk_work_> I need to run, but will be back soon
<gnikunj[m]> diehlpk_work_: gcc works for A64FX. Fujitsu compilers are better though at generating faster executables.
<gnikunj[m]> srinivasyadav227: could you clear the cache and send me the log of a fresh build (do it on A64FX node and not on head node)
<srinivasyadav227> <diehlpk_work_> " g++ -dM -E -x c++ - < /dev/null" <- https://gist.github.com/srinivasyadav18/d0adf32807df69d1958a5443783fb907
<srinivasyadav227> gnikunj[m]: yes, i tried the build on A64FX node
<gnikunj[m]> Yeah sure, but I would like to see what flags it sets and so during cmake config
<srinivasyadav227> oh okay, one min :)
<srinivasyadav227> this flag `HPXLocal_WITH_GENERIC_CONTEXT_COROUTINES` is ON, and during cmake it detected the architecture as `arm`
<gnikunj[m]> It’s not an issue relating to that.
<gnikunj[m]> It’s an error coming from cpuid
<gnikunj[m]> Which is meant to not work aarch
<gnikunj[m]> That’s what I’m investigating rn
<gnikunj[m]> Everything other than HPXLocal_WITH_MALLOC looks correct to me
<gnikunj[m]> But that’s taken care of by build system si that’s not the underlying issue either
hkaiser has joined #ste||ar
<srinivasyadav227> gnikunj[m]: yeah
<srinivasyadav227> <diehlpk_work_> "This would be a good check to..." <- https://gist.github.com/srinivasyadav18/d0adf32807df69d1958a5443783fb907#file-out-log-L197 its aarch64
<gnikunj[m]> srinivasyadav227: not sure where it's coming from really.
<srinivasyadav227> gnikunj: ok :) 😅
<gnikunj[m]> I mean ik where it's coming from (fork-join-executor) but not sure if it's related to build time
<srinivasyadav227> <gnikunj[m]> "srinivasyadav227: the errors are..." <- why there is no arm/arch64 related flag defined there ?, all are x86 based only ?
<gnikunj[m]> Because we don't support cpuids based on arm
<gnikunj[m]> those are features that are used in writing register specific things (hence the tables that stores the info)
<srinivasyadav227> oh ok, generally cpuid is only for x86 based systems ?
<gnikunj[m]> Yeah, I think so. Although you can identify other cpu related info for Arm as well.
<srinivasyadav227> oh ok cool :)
<gnikunj[m]> hkaiser: yt?
hkaiser has quit [Quit: Bye!]
FunMiles has joined #ste||ar
<FunMiles> Is C++20 coroutines with co_await still available in HPX? How can I enable it, and it particular on MacOS?
hkaiser has joined #ste||ar
<FunMiles> @hkaiser: Are C++20 coroutines with co_await as you showed them in CPPcon still available in HPX? How can I enable them?
<FunMiles> When doing grep on the latest source, I'm not finding what I thought would have to be there for that feature to be still enabled.
<hkaiser> FunMiles: just configuring with -DHPX_CXX_STANDARD=20 should do the trick
<FunMiles> hkaiser: Trying that. But here's a quick question: How come a recursive `grep -rn promise_type *` only returns a match in `cmake/tests/cxx20_coroutines.cpp` ? Isn't it necessary to define that type with C++20 coroutines?
<gnikunj[m]> hkaiser: I went through the paper yesterday and have a couple questions.
<gnikunj[m]> 1. The connect function should take in a receiver of the same type as the sender, right? (like if you have a then sender, then you want to connect it to the then receiver)
<gnikunj[m]> 2. If the connect function returns an operation state, why does some receiver not have them? (I see that then receiver doesn't have operation_state struct defined in it)
<gnikunj[m]> 3. How do I retrieve the end value that was set by the receiver using set_value()?
<FunMiles> hkaiser: Are you making use of asio's C++ coroutine features to provide the coroutine interface?
<hkaiser> FunMiles: no asio is involved, let me give you the link to the integration, give me a sec
<hkaiser> gnikunj[m]: 1) no, sender and receiver are independent and unrelated types
<hkaiser> you can combine any sender with any receiver
<hkaiser> 2) I'm not sure I understand, what receiver do you refer to?
<hkaiser> 3) again, not sure I understand your question
<gnikunj[m]> 2) I was talking about HPX implementation. For instance then_receiver doesn't have an operation_state struct with start function.
<gnikunj[m]> Also, for 1) why would you want a bulk_receiver take a then_sender?
<hkaiser> then is a receiver, no?
<gnikunj[m]> 3) What I meant was - how do you get the final value? (equivalent to future.get())
<gnikunj[m]> also, do you have time for a quick call? I think I'll be able to better explain them like that.
<hkaiser> it is received by the lambda you pass to then, for instance
<gnikunj[m]> sure, but what about the final lambda which also returns a value. How do I get that?
<hkaiser> for 2) there is a default operation_state that simply connects a sender with a receiver (i.e. simply passing through the values), that is used when no special functionality is needed
<gnikunj[m]> are there sender algorithms that returns it?
<hkaiser> gnikunj[m]: you can use a make_future to get a future for the overall value
<gnikunj[m]> where is the default operation_state implemented in HPX?
<hkaiser> sync_wait returns the value as well
<gnikunj[m]> hkaiser: make_future isn't in 2300R3.
<gnikunj[m]> wait, sync_wait isn't void?
<hkaiser> no it isn't, but sync_wait is
<gnikunj[m]> ohh crap, right
<gnikunj[m]> sync_wait will return the final value. Now, it makes sense.
<gnikunj[m]> ok, but I still don't understand why you can connect any receiver with any sender
<gnikunj[m]> like why would you connect a bulk receiver with a then sender?
<hkaiser> gnikunj[m]: that's the whole point of this, combining arbitrary senders with arbitrary receivers
<hkaiser> anonymous consumer/producer chains
<hkaiser> give me a sec to locate the default operation_state
<gnikunj[m]> when I thought of arbitrary connections, I though more of transfer algorithms - like transferring execution from one scheduler context to another
<hkaiser> sure, that's just a special receiver
<gnikunj[m]> but transfer is a sender algorithm that returns a sender
<gnikunj[m]> (I'm honestly a bit confused by the genericity and need clarification so I apologize if I sound absolutely stupid asking these questions)
<hkaiser> well yes, it returns a new sender that sits on the target execution environment
<gnikunj[m]> hkaiser: btw we lack quite a few things in HPX wrt sender-receiver. Most notably the pipeline operator
<hkaiser> we have that, I'm sure
<gnikunj[m]> where is it? I didn't see any operator| in the implementation
<gnikunj[m]> is it in execution_base then?
<gnikunj[m]> ok, we do have it
<gnikunj[m]> hkaiser: we don't have some sender adaptors as well - transfer_when_all, on, upon_*
<hkaiser> yes, that's correct
<hkaiser> those are fairly new in p2300
<gnikunj[m]> Btw, I'm traveling tomorrow morning so I won't be able to attend the meeting with Srinivas
<hkaiser> sure, np
<FunMiles> hkaiser: I am a bit confused about the file you sent. It's not in a regular cloned hpx directory. Is it a different branch? The link says it is master but then it's hpx-local
<gnikunj[m]> hkaiser: do you have some time to discuss on sender-receiver stuff on wednesday?
<hkaiser> FunMiles: long story, for now just go with it, I think hpx-local will be back in hpx soon
<gnikunj[m]> I'd like to discuss exactly on the implementation details. Meanwhile, I'll do some debug run to see the call stack of execution.
<hkaiser> gnikunj[m]: sure
<gnikunj[m]> ok, let me ask Katie to set up a time on wed
<FunMiles> hkaiser: OK :) I'll give it a try.
<hkaiser> FunMiles: sorry for the confusion
<FunMiles> hkaiser: For the flag, is it HPX_CXX_STANDARD or CXX_STANDARD ?
<FunMiles> Cmake says it's unused.
<hkaiser> sec
<hkaiser> FunMiles: for any recent versions of HPX master it should be HPX_CXX_STANDARD=20
<hkaiser> you can try HPX_WITH_CXX20=On, that should do the right thing even if it generates a warning
<FunMiles> -DHPXLocal_WITH_CXX_STANDARD=20
<hkaiser> this will changesoon, but for now may work
<hkaiser> again sorry, things are in a bit of limbo right ow
<FunMiles> hkaiser: I was watching from cppcon the presentation about nano-coroutines by Gor Nishanov. He mentions that switching bundles takes micro-seconds, while co-routines take nano-seconds. Have you measured performance benefits from using C++20 coroutines?
<hkaiser> FunMiles: we have not measure this, but using co_await with our futures doesn't have the benefit of nano-seconds overhead as Gor describes
<hkaiser> this is not really using coroutines anyways
<hkaiser> it's more syntactic sure allowing to turn future-base asynchronous continuation-based code inside out and make it easier to read and write
<hkaiser> syntactic sugar*
<hkaiser> it still has the same overhead as using futures without co_await
<FunMiles> OK. Can you imagine that in the future, you could replace fibers with coroutines if all code could use purely co_await approaches?
<FunMiles> Thus benefitting from faster switching?
<hkaiser> FunMiles: nothing prevents you from using co_await based coroutines on top of our threading (fibers)
<hkaiser> those things are orthogonal
<FunMiles> More or less, it would require that I make another futures system, no? One of my conceptual gripes with C++20 coroutines is that the behavior implicitly involves the return type... the behavior of what co_await does depends on the return type.
<hkaiser> FunMiles: no
<hkaiser> you could use light-weight awaitables as suggested by Gor and run them on any HPX thread
<FunMiles> I'm going to have to explore that approach.
<hkaiser> ok
<FunMiles> what are the CMake exported targets? HPXLocal::hpx does not exist it seems
<FunMiles> HPXLocal seems to be OK.
<FunMiles> It does not seem to define the proper include directories.
<hkaiser> HPX::hpx
<FunMiles> target not found.
<hkaiser> FunMiles: do you use hpx or hpx-local?
<hkaiser> which repository?
<FunMiles> hpx-local, since that's the one with the C++20 coroutine capable code.
<hkaiser> ok
<hkaiser> do you need the distributed functionalities of HPX?
<FunMiles> Not at this stage, no.
<hkaiser> ok
<hkaiser> the HPX::hpx_local is the target
<hkaiser> I think ;-)
<FunMiles> It is! :) Thanks
<FunMiles> Though I still have some issues.
<FunMiles> I copied the code for transpose_await.cpp and some of the includes are found while some are not.
<FunMiles> Some filenames have changed? It uses algorithm.hpp but it seems it's algorithms.hpp now with an 's'
<FunMiles> There is no hpx/hpx.hpp ?
jehelset has quit [Ping timeout: 240 seconds]
<hkaiser> FunMiles: you got in the middle of a large restructuring
<hkaiser> could you give us some time to consolidate things?
<hkaiser> for now, I'd suggest you use the hpx repository (not hpx-local)
<FunMiles> hkaiser: OK. I will wait.
<hkaiser> that should automatically pull in hpx-local
<FunMiles> The regular hpx repository does not include the co_await at the moment, right?
<hkaiser> FunMiles: since it pull hpx-local, it will
<hkaiser> pulls*
<hkaiser> FunMiles: the story is that for various reasons there was a plan to split HPX into two repositories, hpx and hpx-local
<hkaiser> for users of hpx nothing should change (the build system makes sure those are well integrated)
<hkaiser> but for even more various reasons, this split (which happened early December) might be reverted soon
<hkaiser> that's what I meant, that things are kindof in limbo and we need another 3-4 weeks to get back to normal
jehelset has joined #ste||ar