<weilewei>
hkaiser I see, yes, I can leave original hpx thread data alone, not an issue. What container would you suggest to use, if vector is not suitable?
<weilewei>
hmm maybe a struct?
<hkaiser>
weilewei: do you need a container at all?
<weilewei>
well, not really, I just need a space for three size_t thread data
<hkaiser>
so no container
<weilewei>
Ok, then a struct
<hkaiser>
right
<weilewei>
ok.. should be an easy fix
<weilewei>
so after this fix, maybe next step is urcu types in libcds, which I haven't read yet
<weilewei>
Let's talk more tmr.. : )
kale[m] has joined #ste||ar
kale[m] has quit [Ping timeout: 260 seconds]
hkaiser has quit [Quit: bye]
bita__ has joined #ste||ar
bita_ has quit [Ping timeout: 260 seconds]
bita__ has quit [Ping timeout: 260 seconds]
kale[m] has joined #ste||ar
kale[m] has quit [Ping timeout: 256 seconds]
nikunj has joined #ste||ar
<ms[m]>
gonidelis: hkaiser others, please disregard all the gcc results from pycicle
nikunj has quit [Read error: Connection reset by peer]
nikunj has joined #ste||ar
kale[m] has joined #ste||ar
nikunj97 has joined #ste||ar
<jbjnr>
nikunj97:
<jbjnr>
nikunj97: Thanks _ I look forward to reading blog
<jbjnr>
was away last week. sorry just skimmed through comments here
<LiliumAtratum>
Hello! Small question.... It was suggested to me, that i/o operations should be performed on the os thread (in fact it sometimes crashes when I do otherwise). I was pointed towards `hpx::threads::run_on_os_thread`, but I fail to find such a function (or similar) in the documentation. Is it there? What should I include?
<ms[m]>
LiliumAtratum: `run_as_os_thread` and `hpx/include/run_as.hpp`
<LiliumAtratum>
OK. Found it! Thank you!
LiliumAtratum has quit [Remote host closed the connection]
<weilewei>
ms[m] meeting in 3 minutes
<gonidelis[m]>
hkaiser: I can't log in to ROSTAM. Possibly I have written down some mistaken password... should I contact alireza?
<hkaiser>
gonidelis[m]: yes, please
akheir has joined #ste||ar
<gonidelis[m]>
hkaiser: When I ssh with my account sometimes it asks me for password and then for the first and second factors... while other times when I log in it asks immediately for the first and second factor, without asking for password first. Any idea what should I fill as "password"? (I reckon that's the first factor)
<hkaiser>
talk to Alireza
<gonidelis[m]>
ok sure
<gonidelis[m]>
Trying to build hpx on my rostam account with these modules Currently Loaded Modules:
<gonidelis[m]>
when I `make -j` I get `/opt/apps/gcc/9.3.0/lib/gcc/x86_64-redhat-linux/9.3.0/include-fixed/bits/statx.h:38:25: error: missing binary operator before token "("
<hkaiser>
gonidelis[m]: somebody had the very same problem the other day
<hkaiser>
I tink it was nikunj
<gonidelis[m]>
nikunj97: any ideas?
<gonidelis[m]>
hkaiser: how many cores do I have available ?
<gonidelis[m]>
hkaiser: can't really find an issue on that
<zao>
gonidelis[m]: Ooh, that error is _awesome_.
<zao>
We ran into it in EasyBuild when upgrading the base CentOS version and not the compiler modules.
<zao>
They are hard-dependant on the OS headers and they copy&mangle some of them at build time. Your sysape may need to rebuild the compiler if they've upgraded the OS.
<ms[m]>
on one hand this is sane because then `par.on(exec).with(params)` results in the same as `par.with(params).on(exec)` but on the other hand it means only `parallel_executor`'s `executor_parameters_type` is useful
<ms[m]>
is this expected behaviour?
nikunj97 has joined #ste||ar
<weilewei>
I am thinking to return a reference when call hpx get_libcds_data to avoid copy, but how to deal with this case if a null thread id is found: https://gist.github.com/weilewei/cce1b276f328100b1dc78f4b29eff7e2 as it complains that I cannot return a non-const lvalue to a rvalue type
nikunj has quit [Ping timeout: 244 seconds]
<weilewei>
or shall I just return a copy? I guess copying std::array<size_t, 3> isn't an expensive operation?
<parsa>
weilewei: std::array reserves its memory at compile time, no malloc is called where your code reaches it. also you want 3 `size_t`s. it's probably cheap enough not to be concerned about
<weilewei>
parsa aha! Got it, thanks
<hkaiser>
ms[m]: hmmm
<hkaiser>
.on() should reuse the original executor parameters
<hkaiser>
.with() will use the ones supplied as arguments
<hkaiser>
sorry, if I misunderstand
<ms[m]>
hkaiser: might not have explained it very well
<ms[m]>
"original executor parameters" = which executor?
<hkaiser>
execpolicy.on(...) will return a new execution policy with the same pramaters as execpolicy
<ms[m]>
I think what you're saying means this is a bug
<ms[m]>
ah, then no bug...
<ms[m]>
but then the executor_parameters_type from exec in on(exec) don't make a difference... hrm
<hkaiser>
right
<hkaiser>
if you need that you need to write p.on(exec).with(exec.parameters())
<ms[m]>
if that's intended behaviour I can live with it, I just (mistakenly) assumed that executor_parameters_type would be used from the executor
<hkaiser>
...which would probably be a correct option as well
<hkaiser>
decisions, decisions, decisions...
<hkaiser>
the whole split between executors and parameters was wrong to begin with
<ms[m]>
yes, decisions... but let's not change that behaviour now for the current setup
<hkaiser>
ok
<ms[m]>
either one is as right or wrong as the other, so better stick with what we have
<hkaiser>
ok
<ms[m]>
thanks for clarifying!
<hkaiser>
ms[m]: what I have done recently is that all executors can expose the parameters interface as well and it will be used
<hkaiser>
we would have to check which parameters implementation would be used in your case if the executor exposes some of the interface APIs
<hkaiser>
but that might resolve the issue
<hkaiser>
if no parameters are associated with the policy and the execuotr exposes parameter APIs, then p.on(exec) will most likely use those
gdaiss[m] has joined #ste||ar
<ms[m]>
hkaiser: right, forgot about those
<ms[m]>
but since parallel_executor and thus static_chunk_size is the default for parallel_policy, there would always be a get_chunk_size for par already
<hkaiser>
iow, the default parameters implementation will prefer calling into the executor than to call its own 'fallback'
<ms[m]>
hmm, I'll try that out, thanks
<hkaiser>
ms[m]: we should check what's used if the executor exposes get_chunk_size
<ms[m]>
yep, will do
<hkaiser>
thanks
<ms[m]>
btw, for context this is of course for the kokkos executors
<ms[m]>
I wanted to have single chunk for those, and let kokkos do its internal chunking
<ms[m]>
but noticed that the default in the executor had no effect
<ms[m]>
I'll let you know what it does with a custom member function on the executor itself
<parsa>
parsa: just checking if i've got the time wrong or if i've opened the wrong zoom meeting
<parsa>
hkaiser: ^
<hkaiser>
parsa: sorry, missed the time
<hkaiser>
parsa: here now
nikunj97 has quit [Ping timeout: 256 seconds]
nikunj has joined #ste||ar
nikunj has quit [Read error: Connection reset by peer]
bita__ has joined #ste||ar
nikunj has joined #ste||ar
nikunj97 has joined #ste||ar
nikunj has quit [Ping timeout: 260 seconds]
<ms[m]>
hkaiser: works perfectly btw
<ms[m]>
and this is even better because it can't be overriden by with(params) either
<ms[m]>
thanks!
<hkaiser>
ms[m]: uhh - can't?
<ms[m]>
hkaiser: hmm, admittedly I didn't try overriding them
<ms[m]>
I'll check...
<ms[m]>
yeah, looks that way
<ms[m]>
as in if I have get_chunk_size as a member function in the executor, that takes precedence over any other executor parameters
<hkaiser>
ms[m]: yah, I think that's what we would expect
<hkaiser>
I think we want to phase out the executor parameters are those shouldn't have been separate to begin with
<hkaiser>
as those*
<ms[m]>
hkaiser: yeah, fair enough, I'm just looking for a solution for the moment, and this does the job as I would expect
<ms[m]>
I don't mind at all if we end up changing those
<hkaiser>
ms[m]: we might have to change things and extract the default parameters from the executor and not the policy, but that would change the way .on() works
<hkaiser>
at some point I would like to deprecate .with(), at which point this will have to be done anyways
<hkaiser>
or change what .with() does
<hkaiser>
it currently modifies the parameters associated with the policy, but it should change the parameters associated with the executor
<hkaiser>
ms[m]: that's also what you had expected
<ms[m]>
hkaiser: if with were to be deprecated, what would the alternative be? with on the executor itself? executor constructor takes various parameters?
<hkaiser>
ms[m]: using the executor properties from p0443
<ms[m]>
ah, of course
<ms[m]>
yeah, that sounds reasonable
nikunj97 has quit [Read error: Connection reset by peer]
weilewei has quit [Remote host closed the connection]
<gonidelis[m]>
hkaiser: spoke with alireza and provided me with a fix
<gonidelis[m]>
The thing is that clang should be used instead of gcc
<gonidelis[m]>
Should I post an issue or sth?
<gonidelis[m]>
I mean if anyone encounters the same problem (or is it too specific?)
<hkaiser>
gonidelis[m]: is it a rostam problem?
<hkaiser>
or an hpx problem?
<hkaiser>
if it's the latter please create a ticket
<gonidelis[m]>
He told me there is a bug with gcc 9
<hkaiser>
ahh, then we can't do anything about it anyways
<gonidelis[m]>
ok...
<gonidelis[m]>
I am awaiting for the rostam power to be unlished ! ;p
nan11 has quit [Remote host closed the connection]
nan11 has joined #ste||ar
weilewei has joined #ste||ar
weilewei has quit [Remote host closed the connection]
karame_ has quit [Remote host closed the connection]
<zao>
gonidelis[m]: I guess you didn't mention anything of the things I said about it?
kale[m] has quit [Ping timeout: 246 seconds]
kale[m] has joined #ste||ar
rtohid has quit [Ping timeout: 245 seconds]
<jbjnr>
hkaiser: yt?
<gonidelis[m]>
zao: no why ;p ?
<zao>
Apart from it strongly resembling an existing problem? *shrug*
<zao>
Not my cluster, not my problem to push to solve.
rtohid has joined #ste||ar
rtohid has left #ste||ar [#ste||ar]
nikunj has joined #ste||ar
<akheir>
gonidelis[m]: I found the root cause of problem on Rostam. glibc was updated when I update to CentOS 8.2 and the gcc ran into some issue
kale[m] has quit [Ping timeout: 240 seconds]
<akheir>
I am recompiling the gcc. Seems to solve the problem.
dd has joined #ste||ar
kale[m] has joined #ste||ar
nikunj has quit [Ping timeout: 260 seconds]
bita__ has quit [Read error: Connection reset by peer]
bita__ has joined #ste||ar
<hkaiser>
akheir: thanks for invesigating!
<hkaiser>
jbjnr: here now
<dd>
apologies ahead of time for the lengthy, somewhat incoherent message. I am having a difficult time debugging an issue with an implicit DAG. I verified that I am correctly computing the DAG using the futurization technique but my program crashes because a shared_future is being accessed before it is in a valid state (I am constructing a
<dd>
vector<shared_future<T>> with default constructed futures). The ith entry in this solution vector will depend on two other entries that may or may not be known immediately and using make_ready_future in the cases where it is unknown breaks the DAG. I do not explicitly call to get in my application as I am using dataflow.
<dd>
Also I've tried using sanitizers and gdb to no avail
<dd>
gdb hangs when loading symbols and running with address sanitizer caused the program to crash immediately
<K-ballo>
"before" it's in a valid state? how do you even manage that?
<K-ballo>
there must be no state at all, I suppose?
nikunj has joined #ste||ar
<dd>
so you can do the following: hpx::shared_future<T> f; and you are ok so long as you don't call f.get();
<dd>
in my application I need to assign back into a vector a result that depends on other futures from that same vector that as I pointed are not known (this is a sweeping algorithm or wavefront)
<dd>
so we compute one and use it to compute the next two and then four and so on
<hkaiser>
if one of the futures is not in a valid state it most probably has not been initialized (was default constructed)
<dd>
right but if you call is ready on the default constructed futures they evaluate to false
<hkaiser>
yes
<dd>
initially
<dd>
it is at some later point in the application where they are accessed before being ready and since I am using dataflow (i.e. not calling get) it is difficult for me to pinpoint where this occurs
<dd>
I assume that dataflow is checking dependencies for me
<hkaiser>
if a future was passed to dataflow, then you can be sure it's ready if the function is being called
<hkaiser>
I'd assume dataflow would throw if called with an invalid future
<dd>
yeah I get {what}: this future has no valid shared state: HPX(no_state)
<hkaiser>
so it has not been initialzed yet
<K-ballo>
something about value semantics...
<K-ballo>
I imagine the expectation is that this particular instance will "become valid" in the future as it is assigned over by some other actually valid future
<dd>
yes precisely
<hkaiser>
nod, but before it has become 'valid' it can't be passed to dataflow
<K-ballo>
yeap, nope, that's not how values work
<K-ballo>
you give dataflow with a future value, not a future instance
kale[m] has quit [Ping timeout: 240 seconds]
<K-ballo>
int x = 3; foo(x); x = 4; // foo still got 3, no matter what you do to x afterwards
<dd>
hmm not sure how to express that relatoinship
kale[m] has joined #ste||ar
<K-ballo>
if you have a dependency graph, then there must be a valid order in which they can be initialized, unless you have cycles in it?
<dd>
I suppose I had the wrong mental model of shared_future
<dd>
|---|---|
<hkaiser>
dd: it's very similar to shared_ptr - if you pass a default constructed shared+ptr by value to a function, then it will be invalid inside the function even if you initialize the original shared_ptr afterwards
<K-ballo>
maybe go back to the beginning, consider why you need default constructed futures at all
<dd>
it would help if I could do some asci art
<K-ballo>
go for it, paste it somewhere and drop a link here
<dd>
how do I type a multiline message
<dd>
alt+enter did work
<dd>
or can I do markdown?
<hkaiser>
dd: lengthy things or markdowns are better put somewhere only (gist?) and linked here
<dd>
in our application we need to sweep from left to right starting in cell 0
<dd>
after computing cell 0 we can compute 1 and 2
<dd>
and then with 1 and two known we can compute 3
<dd>
in a much large mesh you can think of this as being like a wave front
<dd>
along which information is propagated
<dd>
so if I have a solution vector (vector<future<T>>) how do I avoid default initialization
<K-ballo>
do you keep references into those futures? if so, why?
<K-ballo>
or even if you do default initialize them all, as long as you pass them to dataflow after they've been associated to some task everything will be fine