#ste||ar on 2020-06-03 — irc logs at irclog.cct.lsu.edu

2020-02-24 20:46 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020

00:25 Yorlik has quit [Read error: Connection reset by peer]

00:30 Yorlik has joined #ste||ar

01:52 hkaiser has quit [Quit: bye]

02:42 nan11 has quit [Remote host closed the connection]

03:42 Yorlik has quit [Ping timeout: 258 seconds]

04:19 bita_ has quit [Read error: Connection reset by peer]

04:19 bita_ has joined #ste||ar

04:36 weilewei has quit [Remote host closed the connection]

05:10 bita_ has quit [Ping timeout: 256 seconds]

05:11 karame_ has quit [Remote host closed the connection]

06:49 kale[m] has joined #ste||ar

07:40 kale[m] has quit [Ping timeout: 258 seconds]

07:41 kale[m] has joined #ste||ar

07:54 <zao> Ah, piece of #@$@# test.

07:54 <zao> pkg-config test defaults to "Unix Makefiles", even when HPX is built with -G Ninja.

07:54 <zao> I don't _have_ `make` on the system.

08:05 <zao> After installing `make`, `examples` and `tests` build on Rawhide.

09:19 kale[m] has quit [Ping timeout: 260 seconds]

09:32 kale[m] has joined #ste||ar

09:41 kale[m] has quit [Ping timeout: 260 seconds]

09:42 kale[m] has joined #ste||ar

11:51 Yorlik has joined #ste||ar

12:39 hkaiser has joined #ste||ar

12:51 kale[m] has quit [Ping timeout: 260 seconds]

12:51 kale[m] has joined #ste||ar

13:08 <hkaiser> Yorlik: yt?

13:09 <Yorlik> Yes

13:09 <Yorlik> Howdy!

13:09 <hkaiser> Yorlik: trying to come up with a proper API for the thread_mapper

13:09 <hkaiser> what functionality do you need?

13:09 <Yorlik> Quick voice?

13:09 <hkaiser> not now, sorry

13:10 <Yorlik> For me the main purpose is setup of the worker threads.

13:10 <Yorlik> I had horrible complications and races not doing it ahead of use.

13:10 <hkaiser> ok, what do you need then?

13:10 <hkaiser> enumerate the threads?

13:10 <Yorlik> Yes.

13:11 <hkaiser> what info do you need about them?

13:11 <Yorlik> Fundamentally i need to set up my collection of per thread objects

13:11 <hkaiser> label, type, std::thread::id?

13:11 <Yorlik> That wouzld be great.

13:11 <hkaiser> setting up thread-locals is nothing I can help with

13:11 <hkaiser> what other information do you need?

13:11 <Yorlik> Basically I need to do a for(auto thread_id : workers)

13:12 <hkaiser> the threads native handles as well?

13:12 <Yorlik> #Yes - because these are used in the tasks.

13:12 <hkaiser> ok

13:12 <Yorlik> If you can provide a faster way then std::thread::get:id ofc I'd use that

13:12 <hkaiser> std::thread::id does not give you access to the native handle

13:13 <Yorlik> I just need to guarantee, that a task is never using an object that does not belong to the thread.

13:13 <Yorlik> So that association is vital - not the implementation.

13:13 <hkaiser> ok, I'll expose the hpx-label, the hpx-type, std::thread::id, and the native handle

13:13 <hkaiser> anything else?

13:13 <Yorlik> If you can give me something that works just as well - the better.

13:14 <Yorlik> A direct fast access to thread data is the gist of it.

13:14 <Yorlik> Like object = pool[threade_id].acquire();

13:14 <hkaiser> what does that mean?

13:14 <hkaiser> ahh

13:14 <hkaiser> std::thread::id is fine for this

13:14 <Yorlik> I would call that inside a task / hpx thread

13:15 <Yorlik> If everything is set up ahead of time I can use thread_local to cache these objects

13:15 <hkaiser> hmmm

13:15 <Yorlik> Not sure if it faster, since IO just learned thread_local translates to a function call either

13:15 <hkaiser> so you would enumerate through all threads while setting up the thread_local?

13:16 <hkaiser> thread_local is not necessarily a function call

13:16 <Yorlik> Before starting the server I would set up all the pools and then set up a thread_local in the tasks or retrieve the according pool directly., since then I have pointer stability

13:17 <hkaiser> Yorlik: wouldn't for pool[thread_id].acquire() the worker-thread number sufficient?

13:17 <Yorlik> If I can retrive it efficiently inside a task - of course.

13:17 <hkaiser> sure you can

13:17 <Yorlik> Like object = pool[this_worker].acquire();

13:18 <Yorlik> pool[this_worker] would or would not become a thread_local reference

13:18 <Yorlik> So I'd save a call to a hash function.

13:19 <Yorlik> If I have sequential numbers I could just use an array or vector

13:19 <hkaiser> this_worker is a thread_local in hpx already

13:19 <hkaiser> ms[m]: do we expose this publicly? ^^

13:19 <Yorlik> Is it public API?

13:20 <Yorlik> if it is a sequential number I would save nothing with a thread_local and just directly call into the array

13:20 <hkaiser> Yorlik: let's wait for Mikael to answer that, he just recently redid all of that

13:20 <hkaiser> Yorlik: yes, it's a sequential number

13:20 <Yorlik> Great. I'm really happy to see how fast you are reacting when user needs come up.

13:21 <ms[m]> hkaiser: get_worker_thread_num?

13:21 <ms[m]> I changed the internal implementation a bit but the api should be the same

13:22 <hkaiser> right

13:22 <ms[m]> hpx::get_worker_thread_num is definitely public api, but I'm not sure it works on service pools

13:22 <ms[m]> it might...

13:23 <hkaiser> Yorlik: here: https://github.com/STEllAR-GROUP/hpx/blob/master/libs/runtime_local/include/hpx/runtime_local/get_worker_thread_num.hpp

13:24 <Yorlik> Allright.

13:24 <hkaiser> actually here: https://github.com/STEllAR-GROUP/hpx/blob/master/libs/threading_base/include/hpx/threading_base/thread_num_tss.hpp#L70-L82

13:24 <ms[m]> not sure if I'm following this discussion correctly but that is a sequential id and should be less than hpx::get_os_thread_count

13:24 <Yorlik> Here we go :)

13:24 <ms[m]> don't include thread_num_tss.hpp directly though ;)

13:24 <Yorlik> The only thing I need now is to enumerate these ids ahead of use in the tasks.

13:24 <ms[m]> hpx/include/runtime.hpp should be enough I think

13:25 <hkaiser> what do you need these ids for ahead of time?

13:25 <ms[m]> and how much ahead of time...?

13:25 <Yorlik> It would be nice if they were sequential and starting from 0, but that is not an absolute requirement - I could easily skip numbers and just leave some empty slots in the array.

13:26 <hkaiser> they do

13:26 <Yorlik> I do it pretty much at the beginning of mpx_main.

13:26 <Yorlik> hpx_main

13:26 <hkaiser> still no idea what you need them for

13:26 <Yorlik> And after that setup I start the server and run jobs / frames

13:27 <Yorlik> I need to retrive a lua state per task. I do not want to do this with a global locking structure.

13:27 <Yorlik> But in a thread local, non locking pool.

13:28 <hkaiser> Yorlik: why do you need them ahead of time?

13:28 <Yorlik> If the task migrates, the object is givcen back to another pool, than the one where it's coming from

13:28 <Yorlik> I have a working setup now, but it was royal pain to set it up correctly and get this thread safe.

13:28 <hkaiser> Yorlik: why do you need them ahead of time?

13:30 <Yorlik> I do not want to do a check if the pool already exists every time I need to retrieve an object. I solved the pointer stability problem by querying the worker thread count ahead of time and reserving the size of my vector of pairs of id/pool

13:30 <Yorlik> So - now I'm using std::find to retrive a pool from the vector

13:30 <hkaiser> Yorlik: ok, that does not require to have the concrete ids ahead of time, just their number

13:31 <Yorlik> It's easier and clreaner to setup in the beginning when I'm actually doing the setup. Having it inside the object retrieval code made it somewhat messy.

13:31 <Yorlik> As I said - I could solve it - after some messy convolutions and races

13:32 <hkaiser> Yorlik: please listen

13:32 <hkaiser> there is hpx::get_os_thread_count() that gives you the number of worker threads ahead of time

13:32 <Yorlik> Yes, that is the function I am currently using to reserver sufficient space in my vector

13:32 <hkaiser> and there is get_worker_thread_num() that gives you the sequence number starting with zero and ending with the overall count you retrieved

13:32 <hkaiser> that's all you need

13:33 <Yorlik> That would make retrieval easier - i didn't know about get_worker_thread_num()

13:33 <hkaiser> the first is used during setup (sizing arrays etc.), the second is a thread_local you can use as an index into your array

13:33 <Yorlik> Yes. Absolutely.

13:33 <hkaiser> nothing else is needed

13:33 <Yorlik> Is it guaranteed, that these numbers are sequentially starting with 0?

13:34 <hkaiser> yes

13:34 <Yorlik> That makes stuff easy.

13:34 <Yorlik> No more hashes and stuff

13:34 <Yorlik> So this works right now, yes?

13:34 <hkaiser> so you don't need the thread_mapper anymore?

13:34 <Yorlik> Not now actually.

13:34 <hkaiser> yes, works since forever

13:35 <Yorlik> These two functions together give me all I need.

13:35 <hkaiser> great

13:35 <ms[m]> hooray for simple solutions! :)

13:35 <Yorlik> This type of problem should be mentioned in the manual somewhere

13:35 <Yorlik> because setting up a worker thread is an important thing

13:35 <ms[m]> hpx users keep surprising me: https://github.com/STEllAR-GROUP/hpx/issues/4690#issuecomment-638198692

13:36 <hkaiser> yah, nice

13:36 <ms[m]> Yorlik: you should also come up with a name for your game dev compane(/group of people) and comment

13:36 <Yorlik> ms[m] I'm thinking about what to write actually.

13:36 <ms[m]> 👍️

13:37 <Yorlik> I really need to overhaul our website - it is hopelessly crappy right now

13:37 <Yorlik> I'll do some nice simple static setup at some point - maybe after I have delivered this milestone to our scriptewr

13:37 <Yorlik> I'll definitely post something.

13:38 <ms[m]> thanks, appreciated!

13:39 <Yorlik> But please allow me to wait until I have done a basic website overhaul - it will not take long once I have started.

13:39 <hkaiser> take your time

13:39 <Yorlik> Thanks for your understanding :)

13:41 <ms[m]> we're not in a rush...

13:43 <zao> Just register a handler for when the progress future is ready :P

13:45 <Yorlik> :)

14:08 <Yorlik> Things just got so much easier and safer using these two functions.

14:09 <Yorlik> I could toss out a bunch of code full of locks and crap which gave me headaches before.

14:10 <Yorlik> One of the most convoluted functions I had now is a one liner:

14:10 <Yorlik> static inline engine_pool& get_pool( ) {

14:10 <Yorlik> }

14:10 <Yorlik> return *(pools[ hpx::get_worker_thread_num( ) ]);

14:10 <Yorlik> :)

14:11 <Yorlik> Before I had to check if the pool already exists, conditionally create and return it , take a lock, yadda yadda

14:11 <Yorlik> And all in a structure using map or ventor<pair<thread id, unique pool ptr>>

14:11 <Yorlik> vector.

14:13 <Yorlik> Does yield put a yielded task onto the end of the queue, or do yielded tasks have a special queue their liove in?

14:14 <Yorlik> I mean - when yielded - do all others tasks get scheduled before the yielded task on that worker?

14:18 <hkaiser> Yorlik: yielding puts a thread at the end of the queue

14:19 <Yorlik> Thanks.

14:37 weilewei has joined #ste||ar

14:39 <hkaiser> https://www.reddit.com/r/cpp/comments/gvv4fn/hpx_applications_survey_the_stear_group/

14:54 nan11 has joined #ste||ar

15:02 <ms[m]> thanks hkaiser, upvoted

15:04 <hkaiser> ms[m]: you might want to add your kokkos work there

15:16 bita_ has joined #ste||ar

15:18 <ms[m]> hkaiser: good point, will do (I might have to check with the kokkos people first, but I would think they'd be fine with it...)

15:48 <Yorlik> Efficiency at 1000000 objects: /threads{locality#0/total/total}/idle-rate,7,42.400993,[s],67,[0.01%] :)

15:49 <Yorlik> The problem is, I cannot keep up this efficiency for lower object counts.

15:51 <Yorlik> And the framerates do more than double, when I double the number of objects

15:51 <Yorlik> Even if the reported efficiency gets better.

15:52 <Yorlik> Err: Not the framerate .. the framtime does more than double.

15:52 <Yorlik> I would expect that somehow, because of more cache evictions.

15:53 <Yorlik> But I find it odd, that the efficiency gets better, but my time used ber object goes up.

15:54 <Yorlik> I wonder if I'm shooting myself in the foot somewhere, where I'm not thinking of it.

15:56 <heller1> can you plot a graph with the time per object vs. number of objects?

15:56 <heller1> and keep the number of threads fixed

15:56 <heller1> preferably one or so

15:56 <Yorlik> I can do that

15:56 <Yorlik> On thread? OK !

16:03 <ms[m]> hkaiser, heller, K-ballo was talking to cscs people who are trying to explicitly instantiate the futures types they use to reduce compile times

16:04 <hkaiser> hrmm

16:04 <heller1> doesn't help much, IIRC

16:04 <ms[m]> they have a move-only type which was causing problems with this constructor: https://github.com/STEllAR-GROUP/hpx/blob/64f36d57a421bdde88eeb1677e06a23c25174c4a/libs/futures/include/hpx/futures/future.hpp#L854-L858 (and the same for shared_future)

16:05 <ms[m]> well, the question is, does that really require a copy constructor (afaict yes, because of the const& that shared_future::get returns)?

16:05 <ms[m]> and would sfinae-ing out those constructors for non-copyable types be feasible? or do we break other things then?

16:14 <hkaiser> ms[m]: shared_future is copyable, so should be the type stored in it

16:15 <ms[m]> that's what I thought first as well, but then why does it return a const&? the shared state does not need to be copied, right? in any case, for the plain future that shouldn't be required?

16:15 <ms[m]> brb

16:17 <hkaiser> ms[m]: it needs to return a const& as you are allowed to call .get() more than once

16:44 <heller1> returning a const& doesn't require copyable though

16:45 <heller1> the actual error message, and the code that triggers the error would be interesting

16:45 <heller1> then we could see what's wrong

17:13 <Yorlik> heller1 - I have the weird feeling it's indeed the cache, but I need more measuring to understand better.

17:13 <Yorlik> The sum of objects was at ~4 MB and I have a 6 MB level2 cache

17:14 <Yorlik> And theres other data evicting it too

17:14 <Yorlik> At some point the frametimes get very inconsistent, lots of change in the times.

17:16 <Yorlik> Expect this behavor and a bump in time when I hit the level3 cache limit

17:23 <Yorlik> It's kinda shocking to see how the microseconds per object go up from 0.02 to 55.0 - it's all about the cache

17:24 <Yorlik> (100 vs 100000 objects in the example above)

17:24 <Yorlik> err 1000 vs ...

17:24 <Yorlik> single threaded

17:42 <ms[m]> hkaiser: that wasn't quite brb, but back now

17:43 <ms[m]> yeah, I might be drawing false conclusions

17:43 <ms[m]> I'll dig up the full error message

17:44 <ms[m]> hkaiser: here: https://gist.github.com/msimberg/24826795705a767d9f772b6eecdd6d72

17:44 <ms[m]> it's on 1.4.1 but I don't think anything has changed

17:45 <ms[m]> for the record, without those constructors that take a future<shared_future> eti works fine with a noncopyable type, so we don't require it elsewhere

17:48 K-ballo1 has joined #ste||ar

17:48 K-ballo has quit [Ping timeout: 256 seconds]

17:48 K-ballo1 is now known as K-ballo

17:54 <hkaiser> ms[m]: I would have to see the code where it's actually used

17:54 <ms[m]> ah, and I don't have the file that instantiates the future, but I can get it tomorrow

17:54 <ms[m]> crossed thoughts...

17:54 <ms[m]> it's just an explicit instantation of a non-copyable type

17:55 <ms[m]> *a future of a non-copyable type

17:55 <ms[m]> and shared_future

17:55 <hkaiser> the constructor future<future<>> takes the rhs by rvalue

17:55 <K-ballo> yeah, that's expected

17:55 <hkaiser> no copy operation should happen

17:55 <K-ballo> explicit instantiation is eagger

17:55 <hkaiser> fair point

17:55 <hkaiser> ms[m]: I think your guys try to over-optimize things

17:56 <K-ballo> I think the way we do things will defeat your fair attempts to optimize build times

17:56 <ms[m]> yeah, well, it's a type that they use very often so explicitly instantiating is not unreasonable

17:56 <ms[m]> but you may be right

17:57 <ms[m]> I'm just going with the idea at least :)

17:57 <K-ballo> it's not just that... that one is fairly easy to solve, just make that constructor fake-dependent

17:57 <ms[m]> since that was the only thing requiring they copy constructor I'd just disable it for non-copyable types, but I wasn't sure about other consequences from that

17:58 <ms[m]> I can certainly tell them to not bother with right now, I don't think it's such a big deal

17:58 <K-ballo> any attempt to disable it will make it fake-dependent, so that'll work

18:00 <ms[m]> fake-dependent because the constructor doesn't directly require a copy-constructor? or what does fake-dependent mean in this context?

18:00 <K-ballo> depending on a template parameter

18:00 <K-ballo> it will require turning the constructor into a template

18:01 <ms[m]> right

18:01 <K-ballo> template <typename DependentR = R, typename ... enable if...>

18:01 <ms[m]> which would be the same as T for the template class

18:01 <ms[m]> or R

18:01 <K-ballo> only it'd be a dependent R

18:02 <K-ballo> so explicit instantiation will leave it alone

18:02 <ms[m]> mmh, thanks

18:04 <ms[m]> I'll play around with it and see how badly they need it

18:04 <ms[m]> at least they can try out how much eti would save them

18:06 <K-ballo> fwiw this is not something they'd be able to get away with for std:: stuff

18:06 <K-ballo> maybe one day, with concepts and fully constrained interfaces

18:09 <ms[m]> ok

18:09 <ms[m]> well, even more lock-in to hpx for them then ;) not that we were risking losing them to anything else

18:10 karame_ has joined #ste||ar

18:22 <jbjnr> ms[m]: What is the type of future they are instantiating? is it part of DLAF?

18:23 <ms[m]> yeah, something internal

18:23 <ms[m]> I didn't ask what it is exactly

18:24 <jbjnr> does anyone know how to convert an exception (that hasn't been thrown yet) to an exception ptr?

18:24 <jbjnr> std::Exception_ptr

18:24 <ms[m]> I think you'll have to throw it first

18:25 <K-ballo> std::make_exception_ptr

18:25 <hkaiser> ms[m]: throw it and extract it using std::current_exception()

18:25 <ms[m]> std::current_exception I think

18:26 <hkaiser> kale[m]: ahh, that works as well, but probably does the throw/catch game internally as well

18:26 <jbjnr> I see hpx::detail::get_exception

18:26 <hkaiser> K-ballo: ^^

18:26 <jbjnr> I considered throwng, catching and then using the exception ptr, but that seems a bit retarded

18:27 <ms[m]> make_exception_ptr seems much nicer

18:27 <K-ballo> hkaiser: as if

18:27 <hkaiser> nod

18:28 nikunj97 has joined #ste||ar

18:29 <jbjnr> ms[m]: thanks - make_exception_ptr = perfect

18:29 <ms[m]> well, thank you K-ballo :)

18:29 <nikunj97> kale[m], parsa: just a reminder. We have a call now :)

18:32 <jbjnr> sorry. thank you K-ballo

18:33 <jbjnr> ms[m]: almost finished the cuda polling. Thought it was the same as the mpi one, but actually, rewrote it to simplify it as it turns out we don't need to store as much crap

18:37 <ms[m]> nice, looking forward to seeing the results :)

18:38 <ms[m]> I also hope it's on a separate branch from the other cuda pr... ;)

18:39 <jbjnr> no. same one

18:40 <jbjnr> it replaces everything

19:20 <K-ballo> fwiw msvc's make_exception_ptr won't throw

19:31 sayefsakin has joined #ste||ar

19:32 <nikunj97> ms[m], yt?

19:44 nikunj97 has quit [Read error: Connection reset by peer]

20:03 nikunj97 has joined #ste||ar

20:36 Nikunj__ has joined #ste||ar

20:39 nan11 has quit [Remote host closed the connection]

20:40 nikunj97 has quit [Ping timeout: 252 seconds]

20:41 nan11 has joined #ste||ar

20:43 Nikunj__ has quit [Read error: Connection reset by peer]

21:01 Nikunj__ has joined #ste||ar

21:08 Nikunj__ is now known as nikunj97

21:15 weilewei has quit [Remote host closed the connection]

21:15 weilewei has joined #ste||ar

21:39 Nikunj__ has joined #ste||ar

21:43 nikunj97 has quit [Ping timeout: 260 seconds]

21:49 karame_ has quit [Remote host closed the connection]

22:17 Nikunj__ has quit [Quit: Leaving]

22:50 weilewei has quit [Remote host closed the connection]

22:51 weilewei has joined #ste||ar

22:57 kale[m] has quit [Ping timeout: 265 seconds]

23:08 kale[m] has joined #ste||ar

23:15 kale[m] has quit [Ping timeout: 258 seconds]

23:16 kale[m] has joined #ste||ar

23:30 hkaiser has quit [Quit: bye]

23:49 kale[m] has quit [Ping timeout: 260 seconds]