<Yorlik>
OK - so that's not samples, but the nth call.
<hkaiser>
yes
<Yorlik>
33% idle seems a bit much to me
<hkaiser>
yah, it's not too brilliant
<hkaiser>
I have seen worse, however
<Yorlik>
I had to fix some thing which added a slight overhead
<Yorlik>
task time is a bit short right now
<Yorlik>
And I'm having issues with the autochunker - is there still this problem we had a while ago?
<Yorlik>
With the executor
<Yorlik>
task times are < 200 µsec right now I should fix this
<hkaiser>
Yorlik: what problem?
<Yorlik>
It looked like tasks dying
<Yorlik>
I'm still checking what exactly is going on.
<Yorlik>
The unbounded updates didn't show up anymore
diehlpk_work_ has quit [Remote host closed the connection]
diehlpk_work has joined #ste||ar
mcopik has joined #ste||ar
<hkaiser>
Yorlik: sorry, I don't remember any problems like this
<Yorlik>
I think it was the work stealing breaking
<hkaiser>
ok
<hkaiser>
did we discuss this?
<Yorlik>
I'll check back with you once I'm more clear about what exactly is going on.
<Yorlik>
Yes - we actually live debugged it
<hkaiser>
ahh, that one - I thought we found a workaround for you
<Yorlik>
But I just realized a problem caused by a new debugger extension - so its probably that - I can now track exceptions into lua scripts
<Yorlik>
So in the call stack I see lua script lines causing exceptions
<Yorlik>
But it slowed down everything horribly
<Yorlik>
But I wanted to talk to you about something else. In the next milestone I want to do more instrumentation and add custom performance counters and make these visible in ELK stack or Grafana.
<hkaiser>
ok
<hkaiser>
however, I have no idea what ELK stack or Grafana are
<Yorlik>
I wanted to explore the possibility to make this easily accessible for anymone using HPX.
<Yorlik>
They are visualization tools logging helpers
<Yorlik>
Basically you can see parameters changing over time
<Yorlik>
So you could have a dashboard where you see the idle rates changing
<Yorlik>
Its good for monitoring a system
<hkaiser>
ok, cool
<Yorlik>
So I wanted to talk to you about how a generic interface for this that is user firendly should look.
<hkaiser>
creating perf counter can be very easy or more complicated, depending on what you want to expose (and how)
<Yorlik>
because that'S something I could easily factor out and make available if it's possible or makes sense at all.
<hkaiser>
ignore the winperf counter stuff, it's probably disabled anyways
<Yorlik>
OK
<Yorlik>
Hmm - I'm having task times of ~600µs and an idle rate of ~31% - what could I do to improve this?
<hkaiser>
heartbeat.cpp is the connecting app that queries the counter
<hkaiser>
heartbeat_console.cpp is the 'server' that exposes the counter
<hkaiser>
Yorlik: add more work
<hkaiser>
or run on less cores
<Yorlik>
But there is infinite work alreaDY:
<Yorlik>
i#LL CHECK THAT
<Yorlik>
woops
<hkaiser>
Yorlik: no idea what's going on, then
<Yorlik>
Maybe it's the stuff I'm doping between the frames.
<Yorlik>
Running finalizers on a single thread and such
<Yorlik>
Maybe Mr.Amdahl saying hello here.
<heller1>
Rather Beta than alpha
<Yorlik>
Hey heller1!
<heller1>
Hey
<Yorlik>
How's it going?
<heller1>
Good
<heller1>
Can't complain ;)
<Yorlik>
Nice !
<Yorlik>
I spent the last couple of days learning that references into a map can be a really bad idea. Still fighting so many newbie issues.
mcopik has quit [Remote host closed the connection]
<Yorlik>
Got some nice Lua State corription from it and other oddities.
<heller1>
He ;)
<Yorlik>
Also a thread_local references can be a really nasty thing ...
<Yorlik>
So much fun with debugging ..
<heller1>
You never stop learning. That what's keeping it interesting ;)
<Yorlik>
True. Beginner and wondering forever :D
<Yorlik>
Still polishing my current Milestone 1 - but I start seeing the light.
<heller1>
Great
<heller1>
Can't wait to play a first version ;)
<heller1>
How about a game of life thingy?
<Yorlik>
That would be a great example for a distributed app.
<Yorlik>
Your stencil algorithms would totally work with that.
<heller1>
Absolutely
<heller1>
And an interactive viewer being able to drop new life
<heller1>
Zooming to see how many tiles a player can see in parallel before it breaks down
<Yorlik>
I need to get my space representation done - it's part of the next milestone
<Yorlik>
That's also where the tiling system will come in
<Yorlik>
And distributed computing.
<Yorlik>
In the moment I'm still all single node
<heller1>
Ah, ok
<heller1>
Still...
<heller1>
It should always work, shouldn't it?
<Yorlik>
You know that a large portion of my work simply is learning and not implementing.
<Yorlik>
Once implemented it should just work, right.
<Yorlik>
But still I need the tile system.
<heller1>
You can learn which patterns life forever ;)
<Yorlik>
Yes - pattern recognition and evolutionary algorithms
<Yorlik>
Muattion and selection
<Yorlik>
Lots of it happens automagically in GoL anyways.
<Yorlik>
I think I did a Gol ~30 years ago in Turbo pascal.
<Yorlik>
It was very basic.
<Yorlik>
Is there a way to access all worker threads and query some information from them, like get all worker thread IDs for setup purposes?
<hkaiser>
yes, you can iterate over the existing threads
<Yorlik>
I had a reall nasty bug by dynamically creating function thread_local references into thread specific data structire where the map was rehashing and all busted.
<Yorlik>
I really want to do my setup work that is thread specific ahead of use.
<Yorlik>
The result was exceptions deep in Lua with no relation to the bug. We had to go through the app step by step to look for anything that might not be thread safe
<hkaiser>
Yorlik: also, I told you not to try accessing thread_locals from other threads
<Yorlik>
I am not intentionally trying it - quite the opposite
<Yorlik>
Itz just fudging happened.
<hkaiser>
well, if you store references to thread_locals in a map ...
<Yorlik>
No
<Yorlik>
References to the map in thread locals
<hkaiser>
ok
<hkaiser>
but why do you make it thread_local then and not just global?
<Yorlik>
Eacvh call to the function changes the map potentially and invalidated all refs
<hkaiser>
if it's a std::map no references are invalidated
<Yorlik>
I should just use a vector and preallocate
<Yorlik>
its just 12 entries
<Yorlik>
map is nonsense here - just convenient
<Yorlik>
It's alittle bit performance critical since its used on every task start
<Yorlik>
To retrieve a Lua State
<Yorlik>
From the thread_local pool.
<Yorlik>
Also my deleter was broken when the task migrated, but it worked, because the pools themselves are not thread local
<Yorlik>
just the refs to them.
<Yorlik>
But it was basically UB and exploded at times.
<Yorlik>
the invalidated refs just kept working
<hkaiser>
Yorlik: sounds like you're overdesigning here, but I don't have all the details
<Yorlik>
The point was to cache the refs and save time on each task start
<Yorlik>
I have it now dynamically.
<Yorlik>
But I'll replace it with a setup at start
<Yorlik>
Sometimes I'm overengineering things justbecause I want to learn them.
<Yorlik>
Like efficient and correct use of thread_locals
<hkaiser>
well, not sure you actually need thread_locals in your case
<Yorlik>
hkaiser: the autochunker is working - it was that dreaded Lua Debugger and me overlooking how stuff slowed down
<hkaiser>
you use them to access a global data structure and all thread_locals store the same value - so why not access the global directly
<Yorlik>
I need to retrieve a lua state from a pool each task start
<Yorlik>
So I sue one pool per thread
<Yorlik>
use
<hkaiser>
the global is a map/vector of pools anyways, isn't it?
<Yorlik>
Getting a Lua State just uses a thread_local ref to that pool.
<hkaiser>
ok
<Yorlik>
So the function is get_lua_engine and it uses these thread_local refs
<hkaiser>
so you save a single lookup by having the thread_local?
<Yorlik>
Yes
<hkaiser>
is it worth the hassle?
<Yorlik>
That's how I do things - lol
<Yorlik>
it's more about learning optimization techniques - it's not always about a dire problem.
<Yorlik>
It evolved out of me using thread_local pools first
<hkaiser>
I'd wager that accessing the thread-local is slower than accessing the pool from a global vector using a fixed index
<Yorlik>
But then I wanted to access the pools to purge them
<Yorlik>
So I needed to make them un-thread_local
<Yorlik>
And that led to that situation with the refs
<Yorlik>
I need the pools to purge lua states held by them when reloading all scripts
<hkaiser>
sure
<Yorlik>
So each state gets the new scripts
<hkaiser>
so you need to lock the vector on each access anyways
<hkaiser>
the thread_local ref buys you nothing in the end, just adds complexity
<Yorlik>
A thread local reference directly to the pool can do it if it is stable.
<hkaiser>
*sure*
<Yorlik>
The container doesn't change after setup.
<hkaiser>
but since you don't know whether another thread is currently purging your pool you need to lock it anyways, thread_local ref or not
<Yorlik>
Purging is always single threaded
<Yorlik>
And after all is stopped
<hkaiser>
ok
<Yorlik>
The pools remain - just their content gets purged
<hkaiser>
whatever - I'd remove the thread_local, it doesn't give you anything
<Yorlik>
I'll think about it.
akheir has joined #ste||ar
<hkaiser>
Yorlik: I think it would be more important to cache-line align your elements in the global vector
<hkaiser>
to avoid false sharing
<Yorlik>
Makes sense.
<Yorlik>
Since I'm not writing to it after initialization I think cache alignment won't matter. the values will just be cached, since used all the time.
<Yorlik>
hkaiser: How can I filter out all the worker threads from that function? I'm getting a ton of threads if I don't choose a specific state
<Yorlik>
Can the lambda find out if it is running on a worker?
<Yorlik>
Or rather: if the thread_id is a workers id?
<hkaiser>
Yorlik: what do you mean by worker-threads?
<Yorlik>
The threads running tasks
<Yorlik>
like the parloop chunks
<Yorlik>
Because these need to get a pool each
<Yorlik>
So something like hpx::is_worker_thread(id) or so?
<Yorlik>
Unfiltered I'm getting like 150+ threads reported
<Yorlik>
I used that function call from examples and just removed the filter for sudpended threads
<hkaiser>
Yorlik: what is you definition of worker thread?
<Yorlik>
If I run an async or a parallel loop certain OS threads are used, others not.
<Yorlik>
These are workers.
<Yorlik>
They are actually named worker# sth in the MSVC debugger
<hkaiser>
that enumerate_threads gives you all HPX threads, not the kernel threads
<Yorlik>
Argh
<Yorlik>
I need the kernel workers IDs
<hkaiser>
ahh
<Yorlik>
Because this allows me to set up the vector in my init phase
<Yorlik>
So - the threads counted here: hpx::get_num_worker_threads( );
<hkaiser>
yes
<hkaiser>
there is a data structure that stores all registered threads
<hkaiser>
that will give you access to all threads that are registered with the runtime
<hkaiser>
not sure if worker-threads are specially marked except for their associated name
<Yorlik>
I'll figure something out.
<Yorlik>
Thanks a lot !
<hkaiser>
we could certainly extend this with additional flags or somesuch
<hkaiser>
Yorlik: let us know
<hkaiser>
so this mapper knows about more threads that just the worker threads
<Yorlik>
Having access to these threads can help a lot setting up infrastructure to avoid data sharing and locking.
<Yorlik>
What I would like to have is the ability to run a task that cannot be stolen on a designated OS thread to set up thread local stuff, like pools, allocators, whatever..
<Yorlik>
Second best solution is a way to retrieve all IDs, so i can at least map these things on initialization
<hkaiser>
Yorlik: the mapper should give you that, if you need more than it has, let us know
<Yorlik>
I'm on it.
<Yorlik>
get_thread:mapper is a non static member - what would be the instance I have to use?
<Yorlik>
hkaiser: I need to get read access / a copy of the mappers internal maps, label_map_ and thread_map_ or a function giving me the IDs of the workers.
<Yorlik>
Maybe I can hack around with get_thread_label, and then get_threadid
<Yorlik>
OK - I can get the workers indices, by just mindlessly querying thread indices from 0 to 100 and hoping for the best.
<Yorlik>
Might just enumerate in a while loop to get all mappings.
<Yorlik>
Thread count - OK - I have a friend here.
<hkaiser>
Yorlik: the mapper exposes the number of threads it knows about
<Yorlik>
Yeah - just figured it out
<Yorlik>
I think I can live with what's there.
<hkaiser>
the names are the only indication of what type a thread is
<Yorlik>
regex ftw :)
<Yorlik>
hkaiser: The thread id types are problematic I think.
<Yorlik>
It is not guaranteed, that a thread id type is a long int or that it may never change.
<Yorlik>
BTW: get_system_thread_id is protected.
<Yorlik>
I assume get_tghread:id is the boost thread id?
<hkaiser>
Yorlik: I think currently the mapper gives you the underlying system handle - this is probably something we want to change and let it return a std::thread::id instead
<hkaiser>
this is an old data structure that was never meant for public consumption, it might need some adaptation
<Yorlik>
There is a problem with all of this, becuas ethe standard is pretty agnostic concerning the thread id type
<hkaiser>
sure, but it's unique ;-)
<Yorlik>
You cannot even easily stringify it
<Yorlik>
You need to go through stringstream and operator<< if you want something comparable across platforms
<Yorlik>
Like make a string of your long and of the system id and compare
<K-ballo>
the thread mapper one is responsible for 23 out of 26 seconds
<K-ballo>
may just be the first one of the two to be instantiated
akheir1 has joined #ste||ar
akheir has quit [Ping timeout: 265 seconds]
<K-ballo>
nope, it's just that one getting included all over the place
<K-ballo>
(I simply removed it to compare)
<hkaiser>
K-ballo: we should be able to move it to one cpp file
kale[m] has quit [Ping timeout: 240 seconds]
nikunj has quit [Read error: Connection reset by peer]
<K-ballo>
about ~27mb of debug bloat
nikunj has joined #ste||ar
<K-ballo>
heavy mpl based
<nan11>
hkaiser, I am wondering is there a way/function that I could know the whole array's tiling type given a tiled array.
<hkaiser>
nan11: I don't think we have that, do you think it would be useful to have it?
<nan11>
hkaiser, when I combine the values of vectors into a big matrix, I think I need to follow the tiling type that they are tiled. For example, if the arrays are column tiled, then I need to combine them in column direction.
<K-ballo>
hkaiser: do I understand correctly threads are never removed from the thread mapper's thread_info_ vector?
<Yorlik>
K-Ballo: Its a bit strange actually: The API to unregister is there, but it would break index stability if actually used..
<Yorlik>
K-ballo ^^ (Darn Capitalization)
nikunj has quit [Ping timeout: 265 seconds]
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 265 seconds]
nikunj has joined #ste||ar
parsa has joined #ste||ar
<hkaiser>
K-ballo: I don't know - this type was created a long time ago for the sake of the PAPI counters, it's not really used anywhere else
<hkaiser>
it probably needs a good overhaul in order to be generally useful
nikunj has quit [Ping timeout: 272 seconds]
akheir1 has quit [Quit: Leaving]
nikunj has joined #ste||ar
<K-ballo>
if it's papi only then we should reduce it to the minimum
<K-ballo>
it's not a public interface then?
<heller1>
Only through the perf counter interface
<K-ballo>
it's chopping time
<jbjnr>
hkaiser: for both mpi and cuda, I'd like to return a success/error code from hpx::apply(...) - for example executing an mpi_isend does not create a task, but does return an mpi_success or error code. for cuda especially we'd like to return a code from apply. Is there any reason why we can't change the apply machinery to allow this?
<jbjnr>
as far asI can tell, the bool that is currently returned is never used for anything
<hkaiser>
jbjnr: apply is specificly fire&forget
<hkaiser>
if apply itself fails, throw an exception
<hkaiser>
otherwise, use async
nikunj has quit [Ping timeout: 265 seconds]
<weilewei>
(gdb) p hpx::this_thread::get_id()$1 = {id_ = {thrd_ = 0x0}}
<weilewei>
does it mean this hpx thread does not have valid id?
nikunj has joined #ste||ar
<weilewei>
but does each hpx thread suppose to have an unique id when it is constructed?
<weilewei>
or does it mean the hpx runtime is not started properly?
<hkaiser>
default constructed hpx::thread object? it might also have run to completion
nikunj has joined #ste||ar
<weilewei>
what do you mean "default constructed hpx::thread object?"
nikunj has quit [Ping timeout: 260 seconds]
nikunj has joined #ste||ar
<weilewei>
hkaiser yes, I think never construct a hpx thread object, so it uses a default constructor for thread_id()
nikunj has quit [Ping timeout: 256 seconds]
nikunj has joined #ste||ar
<zao>
The possible cases can be default constructed, joined/detached, and moved-from.
nikunj has quit [Ping timeout: 240 seconds]
<weilewei>
I see... then I can't make it default constructed
<zao>
(the possible reasons for an invalid_thread_id being held, that is)
<weilewei>
Ok, I think I need a thread pool or something, to create a thread that has a unique id
<weilewei>
maybe something else is going on, let me double check. But I understand that default constructed or joined/detached/moved threads has invalid id. Thanks @zao
parsa has quit [Remote host closed the connection]