hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020
Yorlik has quit [Ping timeout: 260 seconds]
kale[m] has quit [Ping timeout: 260 seconds]
<weilewei> hkaiser I see, yes, I can leave original hpx thread data alone, not an issue. What container would you suggest to use, if vector is not suitable?
<weilewei> hmm maybe a struct?
<hkaiser> weilewei: do you need a container at all?
<weilewei> well, not really, I just need a space for three size_t thread data
<hkaiser> so no container
<weilewei> Ok, then a struct
<hkaiser> right
<weilewei> ok.. should be an easy fix
<weilewei> so after this fix, maybe next step is urcu types in libcds, which I haven't read yet
<weilewei> Let's talk more tmr.. : )
kale[m] has joined #ste||ar
kale[m] has quit [Ping timeout: 260 seconds]
hkaiser has quit [Quit: bye]
bita__ has joined #ste||ar
bita_ has quit [Ping timeout: 260 seconds]
bita__ has quit [Ping timeout: 260 seconds]
kale[m] has joined #ste||ar
kale[m] has quit [Ping timeout: 256 seconds]
nikunj has joined #ste||ar
<ms[m]> gonidelis: hkaiser others, please disregard all the gcc results from pycicle
nikunj has quit [Read error: Connection reset by peer]
nikunj has joined #ste||ar
kale[m] has joined #ste||ar
nikunj97 has joined #ste||ar
<jbjnr> nikunj97:
<jbjnr> nikunj97: Thanks _ I look forward to reading blog
<jbjnr> was away last week. sorry just skimmed through comments here
<nikunj97> jbjnr, no worries. I also wrote a scalable 1d stencil for benchmarking purposes btw. https://github.com/STEllAR-GROUP/hpx/pull/4769
<jbjnr> thanks. will look more closely later
Nikunj__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 260 seconds]
Yorlik has joined #ste||ar
kale[m] has quit [Ping timeout: 260 seconds]
kale[m] has joined #ste||ar
parsa has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]
parsa has joined #ste||ar
hkaiser has joined #ste||ar
Nikunj__ has quit [Ping timeout: 256 seconds]
nikunj97 has joined #ste||ar
Nikunj__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 260 seconds]
Nikunj__ has quit [Ping timeout: 244 seconds]
nikunj has quit [Ping timeout: 260 seconds]
LiliumAtratum has joined #ste||ar
<LiliumAtratum> Hello! Small question.... It was suggested to me, that i/o operations should be performed on the os thread (in fact it sometimes crashes when I do otherwise). I was pointed towards `hpx::threads::run_on_os_thread`, but I fail to find such a function (or similar) in the documentation. Is it there? What should I include?
<ms[m]> LiliumAtratum: `run_as_os_thread` and `hpx/include/run_as.hpp`
<LiliumAtratum> OK. Found it! Thank you!
LiliumAtratum has quit [Remote host closed the connection]
<weilewei> ms[m] meeting in 3 minutes
<gonidelis[m]> hkaiser: I can't log in to ROSTAM. Possibly I have written down some mistaken password... should I contact alireza?
<hkaiser> gonidelis[m]: yes, please
akheir has joined #ste||ar
<gonidelis[m]> hkaiser: When I ssh with my account sometimes it asks me for password and then for the first and second factors... while other times when I log in it asks immediately for the first and second factor, without asking for password first. Any idea what should I fill as "password"? (I reckon that's the first factor)
<hkaiser> talk to Alireza
<gonidelis[m]> ok sure
<gonidelis[m]> Trying to build hpx on my rostam account with these modules Currently Loaded Modules:
<gonidelis[m]> 1) gcc/9.3.0 2) boost/1.72.0-gcc9-release 3) papi/5.7.0 4) git/2.25.1 5) python/3.8.2 6) cmake/3.16.4 7) ucx/1.7.0 8) pmix/3.1.5 9) hwloc/2.1.0 10) Rostam2
<gonidelis[m]> when I `make -j` I get `/opt/apps/gcc/9.3.0/lib/gcc/x86_64-redhat-linux/9.3.0/include-fixed/bits/statx.h:38:25: error: missing binary operator before token "("
<gonidelis[m]> 38 | #if __glibc_has_include ("__linux__/stat.h")
<gonidelis[m]> `
<gonidelis[m]> Any ideas?
<hkaiser> gonidelis[m]: somebody had the very same problem the other day
<hkaiser> I tink it was nikunj
<gonidelis[m]> nikunj97: any ideas?
<gonidelis[m]> hkaiser: how many cores do I have available ?
<gonidelis[m]> hkaiser: can't really find an issue on that
<zao> gonidelis[m]: Ooh, that error is _awesome_.
<zao> We ran into it in EasyBuild when upgrading the base CentOS version and not the compiler modules.
<zao> They are hard-dependant on the OS headers and they copy&mangle some of them at build time. Your sysape may need to rebuild the compiler if they've upgraded the OS.
nan11 has joined #ste||ar
<zao> We had a huge discussion on Slack too, not sure what people ended up doing in the end.
<gonidelis[m]> well gcc/9.3.0 is loaded by default
<K-ballo> "easy"
<gonidelis[m]> but I can see gcc/10.1.0 being available
<gonidelis[m]> should I load that instead?
<gonidelis[m]> ahh does not seem to work... I guess I will wait for nikunj to respond :/
rtohid has joined #ste||ar
nikunj has joined #ste||ar
<ms[m]> hkaiser: yt? I'm wondering about the `executor_parameters_type` typedef in executors...
<ms[m]> it looks to me like `executor_parameters_type` from custom executors is not actually used if one does e.g. `par.on(exec)`
<ms[m]> https://github.com/STEllAR-GROUP/hpx/blob/5b9de48ab18cee58e8f4e799584ecba87b6a4aed/libs/executors/include/hpx/executors/execution_policy.hpp#L1009-L1025 makes no attempt to get the executor parameters type from the new executor, neither does `rebind_executor`
<ms[m]> on one hand this is sane because then `par.on(exec).with(params)` results in the same as `par.with(params).on(exec)` but on the other hand it means only `parallel_executor`'s `executor_parameters_type` is useful
<ms[m]> is this expected behaviour?
nikunj97 has joined #ste||ar
<weilewei> I am thinking to return a reference when call hpx get_libcds_data to avoid copy, but how to deal with this case if a null thread id is found: https://gist.github.com/weilewei/cce1b276f328100b1dc78f4b29eff7e2 as it complains that I cannot return a non-const lvalue to a rvalue type
nikunj has quit [Ping timeout: 244 seconds]
<weilewei> or shall I just return a copy? I guess copying std::array<size_t, 3> isn't an expensive operation?
<parsa> weilewei: std::array reserves its memory at compile time, no malloc is called where your code reaches it. also you want 3 `size_t`s. it's probably cheap enough not to be concerned about
<weilewei> parsa aha! Got it, thanks
<hkaiser> ms[m]: hmmm
<hkaiser> .on() should reuse the original executor parameters
<hkaiser> .with() will use the ones supplied as arguments
<hkaiser> sorry, if I misunderstand
<ms[m]> hkaiser: might not have explained it very well
<ms[m]> "original executor parameters" = which executor?
<hkaiser> execpolicy.on(...) will return a new execution policy with the same pramaters as execpolicy
<ms[m]> I think what you're saying means this is a bug
<ms[m]> ah, then no bug...
<ms[m]> but then the executor_parameters_type from exec in on(exec) don't make a difference... hrm
<hkaiser> right
<hkaiser> if you need that you need to write p.on(exec).with(exec.parameters())
<ms[m]> if that's intended behaviour I can live with it, I just (mistakenly) assumed that executor_parameters_type would be used from the executor
<hkaiser> ...which would probably be a correct option as well
<hkaiser> decisions, decisions, decisions...
<hkaiser> the whole split between executors and parameters was wrong to begin with
<ms[m]> yes, decisions... but let's not change that behaviour now for the current setup
<hkaiser> ok
<ms[m]> either one is as right or wrong as the other, so better stick with what we have
<hkaiser> ok
<ms[m]> thanks for clarifying!
<hkaiser> ms[m]: what I have done recently is that all executors can expose the parameters interface as well and it will be used
<hkaiser> we would have to check which parameters implementation would be used in your case if the executor exposes some of the interface APIs
<hkaiser> but that might resolve the issue
<hkaiser> if no parameters are associated with the policy and the execuotr exposes parameter APIs, then p.on(exec) will most likely use those
gdaiss[m] has joined #ste||ar
<ms[m]> hkaiser: right, forgot about those
<ms[m]> but since parallel_executor and thus static_chunk_size is the default for parallel_policy, there would always be a get_chunk_size for par already
<hkaiser> iow, the default parameters implementation will prefer calling into the executor than to call its own 'fallback'
<ms[m]> hmm, I'll try that out, thanks
<hkaiser> ms[m]: we should check what's used if the executor exposes get_chunk_size
<ms[m]> yep, will do
<hkaiser> thanks
<ms[m]> btw, for context this is of course for the kokkos executors
<ms[m]> I wanted to have single chunk for those, and let kokkos do its internal chunking
<ms[m]> but noticed that the default in the executor had no effect
<ms[m]> I'll let you know what it does with a custom member function on the executor itself
<parsa> parsa: just checking if i've got the time wrong or if i've opened the wrong zoom meeting
<parsa> hkaiser: ^
<hkaiser> parsa: sorry, missed the time
<hkaiser> parsa: here now
nikunj97 has quit [Ping timeout: 256 seconds]
nikunj has joined #ste||ar
nikunj has quit [Read error: Connection reset by peer]
bita__ has joined #ste||ar
nikunj has joined #ste||ar
nikunj97 has joined #ste||ar
nikunj has quit [Ping timeout: 260 seconds]
<ms[m]> hkaiser: works perfectly btw
<ms[m]> and this is even better because it can't be overriden by with(params) either
<ms[m]> thanks!
<hkaiser> ms[m]: uhh - can't?
<ms[m]> hkaiser: hmm, admittedly I didn't try overriding them
<ms[m]> I'll check...
<ms[m]> yeah, looks that way
<ms[m]> as in if I have get_chunk_size as a member function in the executor, that takes precedence over any other executor parameters
<hkaiser> ms[m]: yah, I think that's what we would expect
<hkaiser> I think we want to phase out the executor parameters are those shouldn't have been separate to begin with
<hkaiser> as those*
<ms[m]> hkaiser: yeah, fair enough, I'm just looking for a solution for the moment, and this does the job as I would expect
<ms[m]> I don't mind at all if we end up changing those
<hkaiser> ms[m]: we might have to change things and extract the default parameters from the executor and not the policy, but that would change the way .on() works
<hkaiser> at some point I would like to deprecate .with(), at which point this will have to be done anyways
<hkaiser> or change what .with() does
<hkaiser> it currently modifies the parameters associated with the policy, but it should change the parameters associated with the executor
<hkaiser> ms[m]: that's also what you had expected
<ms[m]> hkaiser: if with were to be deprecated, what would the alternative be? with on the executor itself? executor constructor takes various parameters?
<hkaiser> ms[m]: using the executor properties from p0443
<ms[m]> ah, of course
<ms[m]> yeah, that sounds reasonable
nikunj97 has quit [Read error: Connection reset by peer]
weilewei has quit [Remote host closed the connection]
<gonidelis[m]> hkaiser: spoke with alireza and provided me with a fix
<gonidelis[m]> The thing is that clang should be used instead of gcc
<gonidelis[m]> Should I post an issue or sth?
<gonidelis[m]> I mean if anyone encounters the same problem (or is it too specific?)
<hkaiser> gonidelis[m]: is it a rostam problem?
<hkaiser> or an hpx problem?
<hkaiser> if it's the latter please create a ticket
<gonidelis[m]> He told me there is a bug with gcc 9
<hkaiser> ahh, then we can't do anything about it anyways
<gonidelis[m]> ok...
<gonidelis[m]> I am awaiting for the rostam power to be unlished ! ;p
nan11 has quit [Remote host closed the connection]
nan11 has joined #ste||ar
weilewei has joined #ste||ar
weilewei has quit [Remote host closed the connection]
karame_ has quit [Remote host closed the connection]
<zao> gonidelis[m]: I guess you didn't mention anything of the things I said about it?
kale[m] has quit [Ping timeout: 246 seconds]
kale[m] has joined #ste||ar
rtohid has quit [Ping timeout: 245 seconds]
<jbjnr> hkaiser: yt?
<gonidelis[m]> zao: no why ;p ?
<zao> Apart from it strongly resembling an existing problem? *shrug*
<zao> Not my cluster, not my problem to push to solve.
rtohid has joined #ste||ar
rtohid has left #ste||ar [#ste||ar]
nikunj has joined #ste||ar
<akheir> gonidelis[m]: I found the root cause of problem on Rostam. glibc was updated when I update to CentOS 8.2 and the gcc ran into some issue
kale[m] has quit [Ping timeout: 240 seconds]
<akheir> I am recompiling the gcc. Seems to solve the problem.
dd has joined #ste||ar
kale[m] has joined #ste||ar
nikunj has quit [Ping timeout: 260 seconds]
bita__ has quit [Read error: Connection reset by peer]
bita__ has joined #ste||ar
<hkaiser> akheir: thanks for invesigating!
<hkaiser> jbjnr: here now
<dd> apologies ahead of time for the lengthy, somewhat incoherent message. I am having a difficult time debugging an issue with an implicit DAG. I verified that I am correctly computing the DAG using the futurization technique but my program crashes because a shared_future is being accessed before it is in a valid state (I am constructing a
<dd> vector<shared_future<T>> with default constructed futures). The ith entry in this solution vector will depend on two other entries that may or may not be known immediately and using make_ready_future in the cases where it is unknown breaks the DAG. I do not explicitly call to get in my application as I am using dataflow.
<dd> Also I've tried using sanitizers and gdb to no avail
<dd> gdb hangs when loading symbols and running with address sanitizer caused the program to crash immediately
<K-ballo> "before" it's in a valid state? how do you even manage that?
<K-ballo> there must be no state at all, I suppose?
nikunj has joined #ste||ar
<dd> so you can do the following: hpx::shared_future<T> f; and you are ok so long as you don't call f.get();
<dd> in my application I need to assign back into a vector a result that depends on other futures from that same vector that as I pointed are not known (this is a sweeping algorithm or wavefront)
<dd> so we compute one and use it to compute the next two and then four and so on
<hkaiser> if one of the futures is not in a valid state it most probably has not been initialized (was default constructed)
<dd> right but if you call is ready on the default constructed futures they evaluate to false
<hkaiser> yes
<dd> initially
<dd> it is at some later point in the application where they are accessed before being ready and since I am using dataflow (i.e. not calling get) it is difficult for me to pinpoint where this occurs
<dd> I assume that dataflow is checking dependencies for me
<hkaiser> if a future was passed to dataflow, then you can be sure it's ready if the function is being called
<hkaiser> I'd assume dataflow would throw if called with an invalid future
<dd> yeah I get {what}: this future has no valid shared state: HPX(no_state)
<hkaiser> so it has not been initialzed yet
<K-ballo> something about value semantics...
<K-ballo> I imagine the expectation is that this particular instance will "become valid" in the future as it is assigned over by some other actually valid future
<dd> yes precisely
<hkaiser> nod, but before it has become 'valid' it can't be passed to dataflow
<K-ballo> yeap, nope, that's not how values work
<K-ballo> you give dataflow with a future value, not a future instance
kale[m] has quit [Ping timeout: 240 seconds]
<K-ballo> int x = 3; foo(x); x = 4; // foo still got 3, no matter what you do to x afterwards
<dd> hmm not sure how to express that relatoinship
kale[m] has joined #ste||ar
<K-ballo> if you have a dependency graph, then there must be a valid order in which they can be initialized, unless you have cycles in it?
<dd> I suppose I had the wrong mental model of shared_future
<dd> |---|---|
<hkaiser> dd: it's very similar to shared_ptr - if you pass a default constructed shared+ptr by value to a function, then it will be invalid inside the function even if you initialize the original shared_ptr afterwards
<K-ballo> maybe go back to the beginning, consider why you need default constructed futures at all
<dd> it would help if I could do some asci art
<K-ballo> go for it, paste it somewhere and drop a link here
<dd> how do I type a multiline message
<dd> alt+enter did work
<dd> or can I do markdown?
<hkaiser> dd: lengthy things or markdowns are better put somewhere only (gist?) and linked here
<dd> so take a look at the commented section
<dd> in our application we need to sweep from left to right starting in cell 0
<dd> after computing cell 0 we can compute 1 and 2
<dd> and then with 1 and two known we can compute 3
<dd> in a much large mesh you can think of this as being like a wave front
<dd> along which information is propagated
<dd> so if I have a solution vector (vector<future<T>>) how do I avoid default initialization
<K-ballo> do you keep references into those futures? if so, why?
<K-ballo> or even if you do default initialize them all, as long as you pass them to dataflow after they've been associated to some task everything will be fine
<K-ballo> f[0] = async(compute, 0)
<K-ballo> f[2] = dataflow(compute, 2, f[0])
<K-ballo> f[1] = dataflow(compute, 1, f[0])
<K-ballo> f[3] = dataflow(compute, 3, f[1], f[2])