Yorlik has quit [Read error: Connection reset by peer]
Yorlik has joined #ste||ar
hkaiser has quit [Quit: bye]
nan11 has quit [Remote host closed the connection]
Yorlik has quit [Ping timeout: 258 seconds]
bita_ has quit [Read error: Connection reset by peer]
bita_ has joined #ste||ar
weilewei has quit [Remote host closed the connection]
bita_ has quit [Ping timeout: 256 seconds]
karame_ has quit [Remote host closed the connection]
kale[m] has joined #ste||ar
kale[m] has quit [Ping timeout: 258 seconds]
kale[m] has joined #ste||ar
<zao>
Ah, piece of #@$@# test.
<zao>
pkg-config test defaults to "Unix Makefiles", even when HPX is built with -G Ninja.
<zao>
I don't _have_ `make` on the system.
<zao>
After installing `make`, `examples` and `tests` build on Rawhide.
kale[m] has quit [Ping timeout: 260 seconds]
kale[m] has joined #ste||ar
kale[m] has quit [Ping timeout: 260 seconds]
kale[m] has joined #ste||ar
Yorlik has joined #ste||ar
hkaiser has joined #ste||ar
kale[m] has quit [Ping timeout: 260 seconds]
kale[m] has joined #ste||ar
<hkaiser>
Yorlik: yt?
<Yorlik>
Yes
<Yorlik>
Howdy!
<hkaiser>
Yorlik: trying to come up with a proper API for the thread_mapper
<hkaiser>
what functionality do you need?
<Yorlik>
Quick voice?
<hkaiser>
not now, sorry
<Yorlik>
For me the main purpose is setup of the worker threads.
<Yorlik>
I had horrible complications and races not doing it ahead of use.
<hkaiser>
ok, what do you need then?
<hkaiser>
enumerate the threads?
<Yorlik>
Yes.
<hkaiser>
what info do you need about them?
<Yorlik>
Fundamentally i need to set up my collection of per thread objects
<hkaiser>
label, type, std::thread::id?
<Yorlik>
That wouzld be great.
<hkaiser>
setting up thread-locals is nothing I can help with
<hkaiser>
what other information do you need?
<Yorlik>
Basically I need to do a for(auto thread_id : workers)
<hkaiser>
the threads native handles as well?
<Yorlik>
#Yes - because these are used in the tasks.
<hkaiser>
ok
<Yorlik>
If you can provide a faster way then std::thread::get:id ofc I'd use that
<hkaiser>
std::thread::id does not give you access to the native handle
<Yorlik>
I just need to guarantee, that a task is never using an object that does not belong to the thread.
<Yorlik>
So that association is vital - not the implementation.
<hkaiser>
ok, I'll expose the hpx-label, the hpx-type, std::thread::id, and the native handle
<hkaiser>
anything else?
<Yorlik>
If you can give me something that works just as well - the better.
<Yorlik>
A direct fast access to thread data is the gist of it.
<Yorlik>
Like object = pool[threade_id].acquire();
<hkaiser>
what does that mean?
<hkaiser>
ahh
<hkaiser>
std::thread::id is fine for this
<Yorlik>
I would call that inside a task / hpx thread
<Yorlik>
If everything is set up ahead of time I can use thread_local to cache these objects
<hkaiser>
hmmm
<Yorlik>
Not sure if it faster, since IO just learned thread_local translates to a function call either
<hkaiser>
so you would enumerate through all threads while setting up the thread_local?
<hkaiser>
thread_local is not necessarily a function call
<Yorlik>
Before starting the server I would set up all the pools and then set up a thread_local in the tasks or retrieve the according pool directly., since then I have pointer stability
<hkaiser>
Yorlik: wouldn't for pool[thread_id].acquire() the worker-thread number sufficient?
<Yorlik>
If I can retrive it efficiently inside a task - of course.
<hkaiser>
sure you can
<Yorlik>
Like object = pool[this_worker].acquire();
<Yorlik>
pool[this_worker] would or would not become a thread_local reference
<Yorlik>
So I'd save a call to a hash function.
<Yorlik>
If I have sequential numbers I could just use an array or vector
<hkaiser>
this_worker is a thread_local in hpx already
<hkaiser>
ms[m]: do we expose this publicly? ^^
<Yorlik>
Is it public API?
<Yorlik>
if it is a sequential number I would save nothing with a thread_local and just directly call into the array
<hkaiser>
Yorlik: let's wait for Mikael to answer that, he just recently redid all of that
<hkaiser>
Yorlik: yes, it's a sequential number
<Yorlik>
Great. I'm really happy to see how fast you are reacting when user needs come up.
<ms[m]>
hkaiser: get_worker_thread_num?
<ms[m]>
I changed the internal implementation a bit but the api should be the same
<hkaiser>
right
<ms[m]>
hpx::get_worker_thread_num is definitely public api, but I'm not sure it works on service pools
<ms[m]>
not sure if I'm following this discussion correctly but that is a sequential id and should be less than hpx::get_os_thread_count
<Yorlik>
Here we go :)
<ms[m]>
don't include thread_num_tss.hpp directly though ;)
<Yorlik>
The only thing I need now is to enumerate these ids ahead of use in the tasks.
<ms[m]>
hpx/include/runtime.hpp should be enough I think
<hkaiser>
what do you need these ids for ahead of time?
<ms[m]>
and how much ahead of time...?
<Yorlik>
It would be nice if they were sequential and starting from 0, but that is not an absolute requirement - I could easily skip numbers and just leave some empty slots in the array.
<hkaiser>
they do
<Yorlik>
I do it pretty much at the beginning of mpx_main.
<Yorlik>
hpx_main
<hkaiser>
still no idea what you need them for
<Yorlik>
And after that setup I start the server and run jobs / frames
<Yorlik>
I need to retrive a lua state per task. I do not want to do this with a global locking structure.
<Yorlik>
But in a thread local, non locking pool.
<hkaiser>
Yorlik: why do you need them ahead of time?
<Yorlik>
If the task migrates, the object is givcen back to another pool, than the one where it's coming from
<Yorlik>
I have a working setup now, but it was royal pain to set it up correctly and get this thread safe.
<hkaiser>
Yorlik: why do you need them ahead of time?
<Yorlik>
I do not want to do a check if the pool already exists every time I need to retrieve an object. I solved the pointer stability problem by querying the worker thread count ahead of time and reserving the size of my vector of pairs of id/pool
<Yorlik>
So - now I'm using std::find to retrive a pool from the vector
<hkaiser>
Yorlik: ok, that does not require to have the concrete ids ahead of time, just their number
<Yorlik>
It's easier and clreaner to setup in the beginning when I'm actually doing the setup. Having it inside the object retrieval code made it somewhat messy.
<Yorlik>
As I said - I could solve it - after some messy convolutions and races
<hkaiser>
Yorlik: please listen
<hkaiser>
there is hpx::get_os_thread_count() that gives you the number of worker threads ahead of time
<Yorlik>
Yes, that is the function I am currently using to reserver sufficient space in my vector
<hkaiser>
and there is get_worker_thread_num() that gives you the sequence number starting with zero and ending with the overall count you retrieved
<hkaiser>
that's all you need
<Yorlik>
That would make retrieval easier - i didn't know about get_worker_thread_num()
<hkaiser>
the first is used during setup (sizing arrays etc.), the second is a thread_local you can use as an index into your array
<Yorlik>
Yes. Absolutely.
<hkaiser>
nothing else is needed
<Yorlik>
Is it guaranteed, that these numbers are sequentially starting with 0?
<hkaiser>
yes
<Yorlik>
That makes stuff easy.
<Yorlik>
No more hashes and stuff
<Yorlik>
So this works right now, yes?
<hkaiser>
so you don't need the thread_mapper anymore?
<Yorlik>
Not now actually.
<hkaiser>
yes, works since forever
<Yorlik>
These two functions together give me all I need.
<hkaiser>
great
<ms[m]>
hooray for simple solutions! :)
<Yorlik>
This type of problem should be mentioned in the manual somewhere
<Yorlik>
because setting up a worker thread is an important thing
<hkaiser>
ms[m]: you might want to add your kokkos work there
bita_ has joined #ste||ar
<ms[m]>
hkaiser: good point, will do (I might have to check with the kokkos people first, but I would think they'd be fine with it...)
<Yorlik>
Efficiency at 1000000 objects: /threads{locality#0/total/total}/idle-rate,7,42.400993,[s],67,[0.01%] :)
<Yorlik>
The problem is, I cannot keep up this efficiency for lower object counts.
<Yorlik>
And the framerates do more than double, when I double the number of objects
<Yorlik>
Even if the reported efficiency gets better.
<Yorlik>
Err: Not the framerate .. the framtime does more than double.
<Yorlik>
I would expect that somehow, because of more cache evictions.
<Yorlik>
But I find it odd, that the efficiency gets better, but my time used ber object goes up.
<Yorlik>
I wonder if I'm shooting myself in the foot somewhere, where I'm not thinking of it.
<heller1>
can you plot a graph with the time per object vs. number of objects?
<heller1>
and keep the number of threads fixed
<heller1>
preferably one or so
<Yorlik>
I can do that
<Yorlik>
On thread? OK !
<ms[m]>
hkaiser, heller, K-ballo was talking to cscs people who are trying to explicitly instantiate the futures types they use to reduce compile times
<ms[m]>
well, the question is, does that really require a copy constructor (afaict yes, because of the const& that shared_future::get returns)?
<ms[m]>
and would sfinae-ing out those constructors for non-copyable types be feasible? or do we break other things then?
<hkaiser>
ms[m]: shared_future is copyable, so should be the type stored in it
<ms[m]>
that's what I thought first as well, but then why does it return a const&? the shared state does not need to be copied, right? in any case, for the plain future that shouldn't be required?
<ms[m]>
brb
<hkaiser>
ms[m]: it needs to return a const& as you are allowed to call .get() more than once
<heller1>
returning a const& doesn't require copyable though
<heller1>
the actual error message, and the code that triggers the error would be interesting
<heller1>
then we could see what's wrong
<Yorlik>
heller1 - I have the weird feeling it's indeed the cache, but I need more measuring to understand better.
<Yorlik>
The sum of objects was at ~4 MB and I have a 6 MB level2 cache
<Yorlik>
And theres other data evicting it too
<Yorlik>
At some point the frametimes get very inconsistent, lots of change in the times.
<Yorlik>
Expect this behavor and a bump in time when I hit the level3 cache limit
<Yorlik>
It's kinda shocking to see how the microseconds per object go up from 0.02 to 55.0 - it's all about the cache
<Yorlik>
(100 vs 100000 objects in the example above)
<Yorlik>
err 1000 vs ...
<Yorlik>
single threaded
<ms[m]>
hkaiser: that wasn't quite brb, but back now
<ms[m]>
yeah, I might be drawing false conclusions
<ms[m]>
it's on 1.4.1 but I don't think anything has changed
<ms[m]>
for the record, without those constructors that take a future<shared_future> eti works fine with a noncopyable type, so we don't require it elsewhere
K-ballo1 has joined #ste||ar
K-ballo has quit [Ping timeout: 256 seconds]
K-ballo1 is now known as K-ballo
<hkaiser>
ms[m]: I would have to see the code where it's actually used
<ms[m]>
ah, and I don't have the file that instantiates the future, but I can get it tomorrow
<ms[m]>
crossed thoughts...
<ms[m]>
it's just an explicit instantation of a non-copyable type
<ms[m]>
*a future of a non-copyable type
<ms[m]>
and shared_future
<hkaiser>
the constructor future<future<>> takes the rhs by rvalue
<K-ballo>
yeah, that's expected
<hkaiser>
no copy operation should happen
<K-ballo>
explicit instantiation is eagger
<hkaiser>
fair point
<hkaiser>
ms[m]: I think your guys try to over-optimize things
<K-ballo>
I think the way we do things will defeat your fair attempts to optimize build times
<ms[m]>
yeah, well, it's a type that they use very often so explicitly instantiating is not unreasonable
<ms[m]>
but you may be right
<ms[m]>
I'm just going with the idea at least :)
<K-ballo>
it's not just that... that one is fairly easy to solve, just make that constructor fake-dependent
<ms[m]>
since that was the only thing requiring they copy constructor I'd just disable it for non-copyable types, but I wasn't sure about other consequences from that
<ms[m]>
I can certainly tell them to not bother with right now, I don't think it's such a big deal
<K-ballo>
any attempt to disable it will make it fake-dependent, so that'll work
<ms[m]>
fake-dependent because the constructor doesn't directly require a copy-constructor? or what does fake-dependent mean in this context?
<K-ballo>
depending on a template parameter
<K-ballo>
it will require turning the constructor into a template
<nikunj97>
kale[m], parsa: just a reminder. We have a call now :)
<jbjnr>
sorry. thank you K-ballo
<jbjnr>
ms[m]: almost finished the cuda polling. Thought it was the same as the mpi one, but actually, rewrote it to simplify it as it turns out we don't need to store as much crap
<ms[m]>
nice, looking forward to seeing the results :)
<ms[m]>
I also hope it's on a separate branch from the other cuda pr... ;)