hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu
<hkaiser> john98zakaria[m]: how would you change the docker file (https://github.com/STEllAR-GROUP/docker_build_env/blob/master/hpx_build_env/Dockerfile)?
Yorlik_ has joined #ste||ar
Yorlik__ has quit [Ping timeout: 255 seconds]
K-ballo has quit [Quit: K-ballo]
hkaiser has quit [Quit: Bye!]
Yorlik_ is now known as Yorlik
hkaiser has joined #ste||ar
K-ballo has joined #ste||ar
<hkaiser> john98zakaria[m]: using a requirements.txt file is an option, but we could also pin the versions in the dockerfile itself
<hkaiser> john98zakaria[m]: but I agree, let's build the image locally and once that works, update the one in the repository
<john98zakaria[m]> hkaiser: Yes but no
<john98zakaria[m]> Because the pinnend versions themselves may have themselves something like numpy < 1.22
<john98zakaria[m]> The safest way is to pin everything
<john98zakaria[m]> So they may pull a non deterministic version of numpy based on the interdependencies
<hkaiser> ok, fair point
<john98zakaria[m]> I work as a python dev, I have been bitten numerous times
<hkaiser> john98zakaria[m]: I appreciate your experience, for sure
<hkaiser> any help you can give is highly appreciated as well
<srinivasyadav227> hkaiser: I am working with integrating SVE to HPX. There is some problem with using (SVE) targets from cmake_fetch_content directly with INSTALL rules in HPX cmake
<hkaiser> srinivasyadav227: I have seen those
<hkaiser> usually a fresh configure fixes that, otherwise use -DHPX_WITH_PKGCONFIG=OFF
<srinivasyadav227> okay, I am trying them now
<srinivasyadav227> https://github.com/STEllAR-GROUP/hpx/pull/5888 is ready for review. I have made the changes. ;-)
<hkaiser> ok, thanks!
<sestro[m]> Should using `hpx::compute::host::block_executor<>` and `hpx::compute::host::block_allocator<value_type>` be enough to make algorithms such as `hpx::transform` NUMA-aware, or are any additional modifications requried for that?
<srinivasyadav227> yes, those are enough
<srinivasyadav227> use executor with the policy like `policy.on(executor)`
<sestro[m]> srinivasyadav227: Okay, thanks. I am under the impression that the tasks are not scheduled properly in many cases, but maybe something else is interfering.
<sestro[m]> srinivasyadav227: Thanks for the link, so what I'm using right now should be fine.
<srinivasyadav227> cool!
<srinivasyadav227> hkaiser: Fresh configure didnt work It gave me same error. But after doing this `-DHPX_WITH_PKGCONFIG=OFF` everything worked perfectly
<srinivasyadav227> hkaiser: could you tell me the reason why its happening this way ? thanks!
<sestro[m]> I'm seeing a large imbalance between the threads (almost a factor of 2 in some cases) and I'm running out of ideas what could cause this.
<srinivasyadav227> Imbalance ? you mean time taken for each thread ?
<srinivasyadav227> sestro: I mean time spent by each thread on doing the work ?
<sestro[m]> srinivasyadav227: Yes. The chunks should be the same size, so I'd expect the runtime per task to be similar. But this is not the case.
<srinivasyadav227> sestro: can you trying passing ` --hpx:numa-sensitive` to the application (in command line)?
<sestro[m]> srinivasyadav227: Tried that before, did not have any measurable impact.
<srinivasyadav227> this might be due to work stealing from other numa domains, not sure though!
<hkaiser> srinivasyadav227: our support for pkg-config is somehow broken
<sestro[m]> <srinivasyadav227> "this might be due to work..." <- Do you know of any combination of scheduler setup/options that would avoid that?
<hkaiser> I have not had the time to investigate, however ms[m] has tried in the past and said it was an issue with cmake
<hkaiser> sestro[m]: yah, --hpx:num-sensitive doesn't do much
<srinivasyadav227> hkaiser: okay. I will try looking into it.
<hkaiser> sestro[m]: I'd start with looking a the idle-rate counter to see how much parallelism is available/utilized
<sestro[m]> hkaiser: Thanks, I'll try that.
<sestro[m]> sestro[m]: As I have to recompile anyway: any options other than `HPX_WITH_THREAD_IDLE_RATES` that might be useful to enable here?
<srinivasyadav227> that flag is enough for cmake, but you need to pass an option to application(executable)
<srinivasyadav227> this one `--hpx:print-counter=/threads/idle-rate`
aalekhn has joined #ste||ar
<sestro[m]> <srinivasyadav227> "this one `--hpx:print-counter=/..." <- Doesn't look too bad:
<sestro[m]> > /threads{locality#0/total}/idle-rate,1,13.517548,[s],2362,[0.01%]
<hkaiser> 23% idle rate, not too bad
<sestro[m]> sestro[m]: Just to illustrate the effects I'm talking about.
<hkaiser> sestro[m]: nice picture
<sestro[m]> <hkaiser> "sestro: nice picture" <- Wish it was a bit more uniform 😁
<srinivasyadav227> hkaiser: Is there anyway to run a piece of code before hpx_runtime starts ?
<hkaiser> sestro[m]: we have two things you need to use, the block_allocator and the block_executor
<hkaiser> the allocator makes sure that the arrays are placed close to the cores and the executor will place the tasks accordingly
<hkaiser> srinivasyadav227: sure
<hkaiser> run it before calling hpx::init
<hkaiser> or did you mean to run something as an hpx thread before hpx_main rns?
<hkaiser> runs*
<gonidelis[m]> ` void (X::* ptfptr) (int) = &X::f;` what is the type of `ptfptr`?
<gonidelis[m]> return type *
<gonidelis[m]> ?
<K-ballo> the type of ptfptr is `void (X::* ) (int)`
<gonidelis[m]> and how the hell does ptfptr bind to the xobject that is being initialized later on
<K-ballo> which is a member pointer of class X of type `void (int)`
<gonidelis[m]> class and type is the same thing, no
<gonidelis[m]> ?
<K-ballo> what are you trying to say?
<K-ballo> class and type are most definitely not the same thing, you likely mean something else
<gonidelis[m]> the words "class" and "type" can be used interchangeably
<gonidelis[m]> ?
<K-ballo> no
<K-ballo> class is a subset of type
<gonidelis[m]> ok
<K-ballo> all class types are types, but not all types are class types: e.g. int
<gonidelis[m]> gosh
<gonidelis[m]> ok
<gonidelis[m]> what is `void (X::* )` supposed to mean in `void (X::* ) (int)`
<K-ballo> consequently you can't have a class pointer to int `T (int::*)`
<satacker[m]> gonidelis[m]: The address of the member function is stored in it right? So all the X objects can call ptfptr?
<K-ballo> nothing, you can't cut it like that
<K-ballo> "spiral" rule, `X::* + void (int) => void (X::*)(int)`
<gonidelis[m]> satacker: bro! yes! hella explanation
<K-ballo> kinda, member functions don't have addresses
<satacker[m]> K-ballo: Ohh
<gonidelis[m]> K-ballo: so how does it bind?
<satacker[m]> K-ballo: Ohh, right so it would have been void (*)(int) otherwise
<K-ballo> what do you mean by that, how is it internally implemented by the usual compilers?
<K-ballo> there's variability in implementation across compilers, plus it varies for class data member pointer and class function member pointer, and for the later it also matters virtual vs non-virtual, and the kind of inheritance hierarchy (single, multiple, diamond)
<K-ballo> the important thing is that when you bind it later on, the resulting execution is as if you had written `xobject.f(20)`
<gonidelis[m]> K-ballo: I am asking how is xobject aware of `*ptfptr`
<K-ballo> what does "aware" mean??
<K-ballo> xobject doesn't need to know anything about ptfptr
<K-ballo> and it's `.*`, it's its own token, not separable
<gonidelis[m]> lol true
<gonidelis[m]> just seems weird that pointer to mem function was declared before the very object itself
<K-ballo> it's a pointer to *class*, not a pointer to *object*
<gonidelis[m]> sheesh!
<gonidelis[m]> no i get it
<K-ballo> it doesn't point into xobject, it points into X
<gonidelis[m]> and what is the use case of having a pointer to a class? why would you want sth like that?
<gonidelis[m]> now*
<satacker[m]> Is it even a pointer if we cannot get it's address?
<K-ballo> pointer to a class member
<K-ballo> satacker[m]: a pointer to member type is not a pointer type, C++ is fun like that
<gonidelis[m]> ahhhh it's pointer to class member. not object member!!!!!!!!!
<K-ballo> no
<K-ballo> it's a pointer to class object or function member
<K-ballo> sorry, data
<gonidelis[m]> make a decision already
<K-ballo> pointer to class data member or function member
<gonidelis[m]> ok ok ok yes
<gonidelis[m]> yes
<gonidelis[m]> yes
<K-ballo> a data member is a subobject, which is also an object
<gonidelis[m]> yes
<gonidelis[m]> so again: why would you need that ?
<K-ballo> the data implementation is actually fairly simple, it's just an offset
<K-ballo> have you ever seen projections..?
<K-ballo> ranges::sort(employees, &Employee::name)
<gonidelis[m]> lol hkaiser apologies
<gonidelis[m]> he just mentioned that like an hour ago
<gonidelis[m]> i forgot immediately
<K-ballo> that's the more mundane use case I can think of
<gonidelis[m]> SAME EXAMPLE HE GAVE!
<K-ballo> of course, that's the cannonical projection example
<gonidelis[m]> wait
<satacker[m]> It's used in pybind11 based bindings pretty heavily
<gonidelis[m]> ?
<gonidelis[m]> the projection needs to be a pointer to a class mem fun cause it operates on the very object that the (external) algo (e.g. sort) is operating on
<K-ballo> the projection needs to be a callable
<gonidelis[m]> ok
<gonidelis[m]> ?
<K-ballo> a pointer to class member is a callable
<K-ballo> callable = function object or pointer to member
<gonidelis[m]> so?
<gonidelis[m]> oh ok!
<K-ballo> it doesn't NEED to be a pointer to class mem fun
<K-ballo> plus the example I just gave isn't, it's a pointer to class mem data
<K-ballo> but it doesn't need to be a pointer to class at all
<gonidelis[m]> ok ok.... what's the benefit of pointer to memfun compared to FOs?
<gonidelis[m]> (yes ^^)
<K-ballo> those are different things, how would we compare them?
<K-ballo> you mean by creating the equivalent function object for each member of a class?
<gonidelis[m]> they are both callables. in what case would you use the memfun over an FO?
<K-ballo> I can't make sense of the question
<gonidelis[m]> why would you need a pointer to a mem fun
<gonidelis[m]> that's the question
<gonidelis[m]> you said projections
<gonidelis[m]> the question remains the same
<gonidelis[m]> why?
<K-ballo> are you asking why I say `&Employee::name` instead of something like `[](Employee const& e) -> std::string const& { return e.name; }`?
<K-ballo> or in a 98 style, `struct employee_name_fo { std::string const& operator()(Employee const& e) const noexcept { return e.name; } };`
<K-ballo> and in the fo case I'd also have to add all the different cv-ref qualification overloads (or deduce-this)
<gonidelis[m]> the employee example is not exactly what i want cause it's a mem data
<K-ballo> fine, make it `&Employee::get_name`
<gonidelis[m]> 'ight
<gonidelis[m]> could you implement it as a lambda instead?
<gonidelis[m]> with captures and all?
<K-ballo> sure, and as a function object too
<K-ballo> there are subtle differences between the different implementations, but for this particular user case it doesn't matter
<K-ballo> so I choose the pmf because it's short, way shorter
<gonidelis[m]> yes
<gonidelis[m]> ok
<gonidelis[m]> so pmf just points to sth that already exists and already has access to the data members
<K-ballo> the only pmf case that can't be replaced by a lambda or function object is switching names
<gonidelis[m]> so instead of declaring a whole FO from scratch you would use pmf
<gonidelis[m]> because ... ?
<K-ballo> ptrfpt could point to one function now, and a different function on the next line
<K-ballo> the lambdas and function object equivalents have the name hard coded, different names require different types
<gonidelis[m]> oh you meant function names
<gonidelis[m]> thought you meant employee names
<gonidelis[m]> ok
<gonidelis[m]> yes
<gonidelis[m]> yes
<gonidelis[m]> agility is nice
<gonidelis[m]> https://godbolt.org/z/P9saaMajv (any way I could skip passing `obj` here?)
<K-ballo> "agility"?
<gonidelis[m]> switch my pointer's pointing
<K-ballo> what's agile about that?
<gonidelis[m]> not being bound to a name
<gonidelis[m]> "hard-coded" as you said
jbalint has joined #ste||ar
akcube[m] has quit [Ping timeout: 248 seconds]
jbalint_ has quit [Ping timeout: 248 seconds]
akcube[m] has joined #ste||ar
<gonidelis[m]> K-ballo: is `int(n)` a prvalue?
<gonidelis[m]> ultimately I want to know if `type(f)(args)` is a temporary or not and what is the benefit of invoking it that way instead of `type f; f(args);` given that `type` overloads the `operator()` of course
<hkaiser> gonidelis[m]: type(f) creates a temporary instance of type
<gonidelis[m]> But u can still use f autonomously afterwards ?
<hkaiser> you could certainly to type f1 = f; f1(args); which would be equivalent (almost)
Yorlik has quit [Ping timeout: 244 seconds]
<gonidelis[m]> hkaiser: i am missing sth here
<gonidelis[m]> sorry
<gonidelis[m]> does not print "foo"
<K-ballo> `int(n)` is a prvalue
<gonidelis[m]> thanks!
<gonidelis[m]> pansysk75: ^^ check
<gonidelis[m]> what about my godbolt? the more i play with it the less i make sense out of it
<K-ballo> it makes no sense, what's f?
<K-ballo> the warning is telling you how it's interpreting it, though
<gonidelis[m]> f is the counterpart of n in `int(n)`
<gonidelis[m]> yes for some reason i got the warning later on
<K-ballo> I assumed `n` existed
<K-ballo> `f` does not
<gonidelis[m]> ah.... n doesn't have to exist
<K-ballo> then the question is meaningless
<gonidelis[m]> for god's sake is it initialization of typecasting?
<K-ballo> ?
<K-ballo> `int(n)` will not compile if `n` does not exist
<gonidelis[m]> `int(n)` is it typecasting or variable declaration?
<K-ballo> depends on the context
<gonidelis[m]> what?
<K-ballo> yeah, in that context you are declaring `n` of type `int`
<pansysk75[m]> in the specific case that sparked this conversation, "n" existed I think
<gonidelis[m]> yes.... goddamit it's the context
<gonidelis[m]> so pansysk75 it's type casting
<gonidelis[m]> K-ballo: we talk about your code btw
<gonidelis[m]> and Panos is right
<K-ballo> I don't think I've ever heard of Panos before, have I?
<gonidelis[m]> i just pinged him... new guy on the block
<K-ballo> haven't met him
<pansysk75[m]> yes, came here from GSoC and working with gonidelis , I will be around and more active in this chat once I catch up with the introductory stuff
<pansysk75[m]> nice to meet you
<K-ballo> hi pansysk75[m], so you are Panos
<K-ballo> if one assumes `type` to be a class type then yes, there's a difference in that `type(f)` is an rvalue and `f` is an lvalue
<K-ballo> but then it would not be unrelated to the invoke macro implementation
<K-ballo> dah, it would be unrelated :)
<gonidelis[m]> the question is can we initialize (or declare i don't know what the proper wording) f like `type(f)` and use the `operator()` immediately? hence, `type(f)()` ?
<gonidelis[m]> seems like no
<K-ballo> a declaration is a statement, an invocation is an expressoin
<gonidelis[m]> cannot mingle them i guess?
<K-ballo> you can, but not in that particular way
<K-ballo> what would be the point anyway?
<gonidelis[m]> ok ok yes
<gonidelis[m]> the bottom question was: is ` (::hpx::util::detail::invoke<decltype((F))>(F)(__VA_ARGS__))` a temporary object here?
<K-ballo> that entire expression is equivalent to the result of the invocation, so that would depend on the return type of the selected operator
<K-ballo> you probably mean to ask something else
<gonidelis[m]> and if yes: 1. what's the point of handling as a temp instead of unrolling it and using it as an lvalue (primal question)?
<gonidelis[m]> 2. Could we do F(args) later on after that expression? (secondary question just for the sake of C++ knowledge)
<K-ballo> unrolling and using it as an lvalue??
<gonidelis[m]> Hartmut's suggestion
<K-ballo> uh?
<gonidelis[m]> <hkaiser> "you could certainly to type f1 =..." <- i meant this
<K-ballo> that'd be almost equivalent, the almost would matter in a tool like invoke
<K-ballo> you can't lvalue-ize the target in invoke, or you may end up selecting lvalue overloads for an rvalue
<K-ballo> you must perfectly forward the target, which is what the cast is doing
<K-ballo> you also wouldn't be able to have variables declared in a macro (not in the way you'd want them to work)
<gonidelis[m]> yes yes of course
<gonidelis[m]> ok perfect explanation
<gonidelis[m]> it's all about std::invoke then
<K-ballo> no, not really
<gonidelis[m]> SORRY I MENAT `HPX_INVOKE` WHICH IS SUPER MORE EFFICIENT THAN std::invoke
<K-ballo> no, not really either
<gonidelis[m]> hahahahahahha
<K-ballo> if you have a target callable which doesn't overload on value category, then it doesn't matter which value category you use, there's always one candidate possible
<K-ballo> if you have a target callable which overloads on value category, invoking it on the wrong value category will give you a compilation error if you are lucky, or select a bad candidate if not
<gonidelis[m]> rumor has it you provided us with HPX_INVOKE for better perf
<K-ballo> no, just for less bloat
<K-ballo> there's no performance difference whatsoever in an optimized build
<gonidelis[m]> K-ballo: guh this would be awful
<K-ballo> note: `(F)` is an expression, `decltype((F))` is thus always a reference type,
<K-ballo> `static_cast<decltype((F))&&>(F)` is effectively std::forward,
<K-ballo> `T(F)` is a C-style cast, which is equivalent to `static_cast<T>(F>)` if that's well formed, so in this case it is also effectively forward
<K-ballo> the macro basically says `std::forward<F>(f)(std::forward<Args>(args)...)`
<K-ballo> I believe be have a forward macro now? so `HPX_FWD(F)(HPX_FWD(args)...)`
<gonidelis[m]> yes we do
<sestro[m]> <hkaiser> "the allocator makes sure that..." <- Yes, I'm using both components. The allocation seems to work just fine, at least according to some crude testing via `get_mempolicy()` on different segments of the allocated memory.
<gonidelis[m]> hkaiser: K-ballo: is the exec_policy argument on C++17 parallel algorithms a restriction or a suggestion on the execution?
<gonidelis[m]> does std::execution::par suggest parallel execution if possible or does it require it?
<hkaiser> it's a restriction on the lambda's execution
<hkaiser> it says that the lambda is allowed to be executed out of order
<hkaiser> and possibly concurrently
<hkaiser> also, it says that the lambda is allowed to synchronize between its invocations
<K-ballo> std::execution::par tells the algorithm that it can call the predicate in parallel, it requires nothing
<hkaiser> seq says that the lambda needs in-order, non-concurrent execution
<gonidelis[m]> calling concurrent execution more restrictive is sth i wouldn't think of
<hkaiser> its less restrictivecompared to seq
<gonidelis[m]> ok got it
<gonidelis[m]> but the algo may as well opt to execute the thing sequentially if say you provide fwd iterators
<gonidelis[m]> even though you provided par, as the first arg
<hkaiser> yep
<hkaiser> par does not require parallel execution, it merely allows it
<gonidelis[m]> awesome
<john98zakaria[m]> This should unblock the docs
<john98zakaria[m]> s/pulls/pull/48/