hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020
Yorlik has quit [Ping timeout: 240 seconds]
sayef_ has joined #ste||ar
sayefsakin has quit [Read error: Connection reset by peer]
Yorlik has joined #ste||ar
kale[m] has quit [Ping timeout: 244 seconds]
kale[m] has joined #ste||ar
hkaiser has quit [Quit: bye]
akheir has quit [Quit: Leaving]
Yorlik has quit [Ping timeout: 264 seconds]
nanmiao99 has quit [Remote host closed the connection]
nikunj97 has joined #ste||ar
Nikunj__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 258 seconds]
bita__ has quit [Ping timeout: 260 seconds]
nikunj97 has joined #ste||ar
Nikunj__ has quit [Ping timeout: 240 seconds]
nikunj97 has quit [Read error: Connection reset by peer]
Nikunj__ has joined #ste||ar
nikunj97 has joined #ste||ar
Nikunj__ has quit [Ping timeout: 240 seconds]
sayef_ has quit [Quit: Leaving]
ms[m] has quit [*.net *.split]
neill[m] has quit [*.net *.split]
ralph[m] has quit [*.net *.split]
neill[m] has joined #ste||ar
ralph[m] has joined #ste||ar
ms[m] has joined #ste||ar
kale[m] has quit [Read error: Connection reset by peer]
kale[m] has joined #ste||ar
kale[m] has quit [Ping timeout: 240 seconds]
kale[m] has joined #ste||ar
Yorlik has joined #ste||ar
Nikunj__ has joined #ste||ar
nikunj has quit [Ping timeout: 244 seconds]
nikunj97 has quit [Ping timeout: 265 seconds]
nikunj has joined #ste||ar
nikunj97 has joined #ste||ar
nikunj has quit [Ping timeout: 260 seconds]
Nikunj__ has quit [Ping timeout: 256 seconds]
nikunj has joined #ste||ar
Nikunj__ has joined #ste||ar
Nikunj__ has quit [Read error: Connection reset by peer]
nikunj97 has quit [Ping timeout: 240 seconds]
Nikunj__ has joined #ste||ar
Nikunj__ has quit [Read error: Connection reset by peer]
nikunj97 has joined #ste||ar
kale[m] has quit [Ping timeout: 256 seconds]
kale[m] has joined #ste||ar
nikunj97 has quit [Quit: Leaving]
kale[m] has quit [Ping timeout: 260 seconds]
LiliumAtratum has joined #ste||ar
<LiliumAtratum> Hello! I have a question about `hpx::parallel::transform_inclusive_scan`. The function takes a sequence of Ts, the unary transformation operator `conv` T->R, and a binary reduction operator `op` (R,R)->R. And yet, the documentation requires that T is *implicitly* convertible to R. And in fact it does! It doesn't call `conv` before calling `op`.
<LiliumAtratum> Why is that?
<LiliumAtratum> In my code I cannot make `T` implicitly convertible to `R`, but I have my `conv` function.
<LiliumAtratum> Note that `std::transform_inclusive_scan` does not have this requirement and I can use with my code without a problem
<K-ballo> LiliumAtratum: sounds like a bug.. do you have the full error message output?
<LiliumAtratum> I can prepare somehting. But I found a mistake(?) on my side by not including the 'init' parameter.
<K-ballo> sounds like another bug, `init` is optional, unless you needed it for type reasons?
<LiliumAtratum> what I am concerned is, that arguments to op seems to be given in the opposite order. Which for a non-commutative operators may be breaking.
<LiliumAtratum> i'll prepare a running example, give me a minute
hkaiser has joined #ste||ar
<K-ballo> there is no order
<LiliumAtratum> well, there is a difference if you call op(a,b) and op(b,a) for a noncommutative operator
<K-ballo> sure, but the types of a and b are the same already
<K-ballo> or are we discussing values now?
<LiliumAtratum> Yes, I am talking about my second issue at the moment
<LiliumAtratum> preparing an example...
<LiliumAtratum> Now, no matter the order in which the summation is performed, the left argument should always be smaller than the right. So I expect the code to just perform a prefix sum. But I get -10000 everywhere.
<K-ballo> this doesn't look related to the type conversions from earlier
<LiliumAtratum> No, that's a second issue
<LiliumAtratum> I'll prepare the first issue too
<K-ballo> ok
<LiliumAtratum> Here is the first thing: https://ideone.com/plain/g4TTh7
<LiliumAtratum> uncomment `Integer{0}` and it works.
<K-ballo> LiliumAtratum: that last one is definitely a bug, hpx is using a default constructed iterator's value type as init when one isn't provided, which is just wrong
<K-ballo> please report it on github
<LiliumAtratum> ok, I'll do
<hkaiser> HPX PMC meeting today at 9am CDT/16:00 CEDT here: https://lsu.zoom.us/j/335605227
<LiliumAtratum> And how does the former issue look for you?
<K-ballo> I don't understand what the first example is supposed to ilustrate
<K-ballo> what are the actual and expected outputs?
<hkaiser> ms[m]: yt?
<LiliumAtratum> In the first example I am performing { 1 op 10 op 100 op 1000 } with that fancy operator `op`. Since the left argument is always lower than the right (no matter how you put the parenthesis in the expression) I expect it to produce an output: 1, 11, 111, 1111.
<LiliumAtratum> But I get -10000, -10000, -10000, -10000 instead
<K-ballo> ok, sounds wrong too
<LiliumAtratum> Out of curiosity I also put `hpx::cout` when the operator is invoked, and to my surprise it seems to be invoked twice, once with the expected parameters, and once with flipped parameters (which should not happen since op may be non-communatitive)
<K-ballo> I have a vague recollection of already discussing something about the scans and non-commutative operations, specifically I remember crafting tests for string concatenation
<K-ballo> uh... so what's your actual actual output then?
<LiliumAtratum> Currently: `1+0 / 0+1 / 10+1 / 1+10 / 100+11 / 11+100 / 1000+111 / 111+1000 / -10000 / -10000 / -10000/ -10000` (where `/` stands for a newline)
<LiliumAtratum> the last 4 items is the content of the `output` vector. The first 8 items are debug prints from within the `op` invocation.
<K-ballo> that sounds like something else entirely
<LiliumAtratum> While I understand the amount of `op` invocations may vary because of scheduling etc, I would still expect the contents of `output` to be correct. Which I believe should be: `1 / 11 / 111/ 1111`
Nanmiao11 has joined #ste||ar
<hkaiser> heller1: yt?
karame_ has joined #ste||ar
<K-ballo> seems we need those for the transform_ cases as well
<LiliumAtratum> seems so :(
<LiliumAtratum> should I file a bug at the github for this too?
<K-ballo> sure
<LiliumAtratum> done. Thank you!
<hkaiser> LiliumAtratum: I'm not sure the sacn algorihtms should support non-commutative operators
<K-ballo> they do, they are required to
<hkaiser> ok
<hkaiser> wasn't sure anymore
<hkaiser> K-ballo: the spec says GENERALIZED_\-NONCOMMUTATIVE_\-SUM
<hkaiser> so I'd assume commucativity is not required
<K-ballo> indede
<hkaiser> the algorithms require associativity, though
<K-ballo> all the parallel ones do, afair
<K-ballo> note we fixed this for the non transform_ scans a few years ago
<hkaiser> did we? ok.
<K-ballo> see above for the related regression tests
<hkaiser> so it requires commutativity after all
<hkaiser> or not :/
<hkaiser> standardeeze is obtuse, always
<hkaiser> so generalized_sum allows for the elements to be permutations, generalized_noncommucative_sum does not allow for any permutation
<hkaiser> so we can't assume commucativity of op as the sequence has to be observed
<K-ballo> noncommutative means we can't assume commutativity, yes
<diehlpk_work_> Caliban is a tool that helps researchers launch and track their numerical experiments in an isolated, reproducible computing environment. It was developed by machine learning researchers and engineers, and makes it easy to go from a simple prototype running on a workstation to thousands of experimental jobs running on Cloud.
nikunj97 has joined #ste||ar
<jbjnr> hkaiser: I am grumpy. The function arguments are being called - even for print<false> invocations. I need to find a way to stop that that doesn't require an extra if wrapper around the call
<jbjnr> fix - wrap arguments in a lambda - that solves it, but it's not ideal.
<hkaiser> jbjnr: using lambdas everywhere bloats code generation
<hkaiser> we have had that for assertions, but decided to stay away from it in the end
<jbjnr> and it doesn't work when ptint<true> is enabled
<jbjnr> I only need it inside my debug print statements, so I will add a delayed:: invoke wrap for arguments that are function calls, thgen specialize for that
<jbjnr> or something ....
<jbjnr> must go out in a moment, so will look into it later.
LiliumAtratum has quit [Remote host closed the connection]
<diehlpk_work_> jbjnr, What is the status of the libfabric parcelport? We intend to submit a porposal for Piz Daint in October?
<diehlpk_work_> So we like to use libfabric for this proposal. Do you think you can get time to work on it if we submit the proposal
Nanmiao11 has quit [Remote host closed the connection]
nikunj97 has quit [Read error: Connection reset by peer]
bita__ has joined #ste||ar
karame_ has quit [Remote host closed the connection]
weilewei has quit [Remote host closed the connection]
Nanmiao11 has joined #ste||ar
nikunj97 has joined #ste||ar
Nikunj__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 240 seconds]
nikunj97 has joined #ste||ar
Nikunj__ has quit [Ping timeout: 260 seconds]
weilewei has joined #ste||ar
weilewei has quit [Ping timeout: 245 seconds]
karame_ has joined #ste||ar
weilewei has joined #ste||ar
<jbjnr> diehlpk_work_: I have been working on the libfabric PP quite a lot recently, but it is still not tested on anything major like octotiger. It is functioning on my test cases, but some recent system changes on daint broke the gni layer and I need to do more testing there to see if it is working properly. It should be working well enough for you to get scaling runs for a proposal if you need them. The bug that causes octotiger to
<jbjnr> fail has not been fixed, but if I can reproduce it on daint with the latest builds of the LF PP, then I would hope to be able to fix it quite quickly. I need to get everything working soon anyway, so it would be better to get things fixed in the next weeks or so. I am working on a libfabric backend for another project and so this is a good time to get things going. Ideally I'd like to abstract most of the code into a smaller
<jbjnr> library that can be used by both projects and I'm working towards that, though they have very different use cases...
<diehlpk_work_> jbjnr, Thanks for the update
<diehlpk_work_> We would have to submit the the proposal in october
<diehlpk_work_> I am working to get Octo-Tiger with Kokkos running on SUmmit
nikunj97 has quit [Ping timeout: 264 seconds]
kale[m] has joined #ste||ar
<gonidelis[m]> any idea why the foreach tests fail?
<gonidelis[m]> actually, all algorithms tests fail
<gonidelis[m]> I run the on rostam
<gonidelis[m]> them ^^
<gonidelis[m]> That's a sample verbose output
<gonidelis[m]> "48: /home/giannis/build/foreach_adaptation/bin/foreachn_projection_test: error while loading shared libraries: libc++.so.1: cannot open shared object file: No such file or directory"
<K-ballo> missing libc++ ...?