<jbjnr>
hkaiser: simbergm I am working on DCA++ today and not going to get my PR for the scheduler ready for a bit - therefore I must sadly advise you to go ahead with your thread based changes and I'll just have to keep playing catch up later
<hkaiser>
jbjnr: thanks for looking into dca, and I think we can wait for another couple of days
<heller>
hkaiser: and the corresponding objects using the cache aligned data
<jbjnr>
(I think the main cause of my trouble is that the thread indexing was not consistent somewhere, this caused all my stuff to break in a way that worked fine, but performance degraded)
<hkaiser>
heller: sure
<heller>
hkaiser: that's the only place I can spot which might incur performance differences
quaz0r has joined #ste||ar
<hkaiser>
jbjnr: did you find the cause for this?
<hkaiser>
heller: nod
<jbjnr>
(in one of my major merges over the last couple of months, I must have messed up something - or perhaps someone changed the indexing somewhere and I didn't notice)
<jbjnr>
I get good performance on daint and laptop now, but ault and tave were down or mainenance so I've not tested them. cholesky gives terrible performance and I want to fix it
<hkaiser>
ok
<jbjnr>
and apex doesn't work at all now. Can't even get OTF output working
<jbjnr>
not sure what I've done wrong
<hkaiser>
understood - we can wait, I think - np
<heller>
we really need someone to take of care of apex
<hkaiser>
heller: Kevin is back on board, financing has been re-established ;-)
<jbjnr>
hkaiser: the problem with dca++ is that all the include paths have changed and for whatever reason the new modules are not being included
<hkaiser>
k
<hkaiser>
jbjnr: all the old path should still work
<hkaiser>
you will get the warnings, but otherwise you should be fine
<simbergm>
jbjnr: :/
<simbergm>
build or install dir?
<simbergm>
what hkaiser said ^
<jbjnr>
things like <hpx/config.hpp>
<hkaiser>
should still be fine
<hkaiser>
if not, then the build system is broken
<jbjnr>
looks liek config.hpp is no longer generated
<hkaiser>
jaafar: it never was
<hkaiser>
jbjnr: ^^
_bibek_ has quit [Quit: Konversation terminated!]
_bibek_ has joined #ste||ar
<simbergm>
should be in libs/config/include/hpx/config.hpp
Coldblackice_ has joined #ste||ar
_bibek_ has quit [Quit: Konversation terminated!]
Coldblackice has quit [Ping timeout: 250 seconds]
_bibek_ has joined #ste||ar
_bibek_ has quit [Read error: Connection reset by peer]
_bibek_ has joined #ste||ar
<jbjnr>
heller: looking again at what I was doing with DCA++ makes me think I could use your context stuff here. I created an "abstraction layer" between the std::threads and the hpx::threads - but it would have been much easier with your stuff. (Except that I'd still need to rewrite everything anyway :( )
<heller>
jbjnr: yeah, probably. that's the spirit :D
<jbjnr>
did you say the std::threads version is feature complete?
<jbjnr>
how soon before we can actually use it?
<hkaiser>
I wouldn't hold my breath
<jbjnr>
lol
<heller>
I didn't say it is feature complete
<heller>
it's ready when the libfabric PR is in ;)
<jbjnr>
ok. I probably just assumed that "everything works" mean that!
<jbjnr>
libfabric + scheduler you mean
<heller>
right
<heller>
well. everything works that has been implemented
<heller>
that is, the agent stuff works, as in yield, suspend, resume
<heller>
all the context stuff and spawning functions on agents etc still requires work
<simbergm>
jbjnr (and others): I'd like your opinion on the resource partitioner
<jbjnr>
delete it all!
<hkaiser>
simbergm: which one? we have two ;-)
<jbjnr>
(kidding obviously)
<jbjnr>
simbergm: clean it up and reduce all the duplication
<simbergm>
currently it depends on the runtime for sanity checks (check that the runtime ptr is null or not)
<simbergm>
I could either add global functions that forward to the partitioner instance, or make init/start take an optional partitioner instance
<simbergm>
resource partitioner
<simbergm>
or just remove the checks...
<simbergm>
bah, it's not about duplication
<jbjnr>
remove the checks if possible
<jbjnr>
only in pre-main init does it ever not have them
<simbergm>
removing is easy :P
<jbjnr>
I like the idea of init/start taking a partitioner
<jbjnr>
you mean if the user instantiated one already?
<jbjnr>
as in some of the examples that create custom pools
<simbergm>
yeah, exactly
<jbjnr>
that would be clean I think. Not sure why the runtime pointer is needed. can't remember
<simbergm>
I also like that, but it adds more overloads to the already too many overloads...
<simbergm>
would let us get rid of the partitioner singleton though...
hkaiser has quit [Ping timeout: 250 seconds]
<simbergm>
so my question is mainly how badly do we want those sanity checks there
<simbergm>
essentially they check that you don't create thread pools after the runtime has been started
<simbergm>
or shirink/expand pools when it's not started (I wonder if that even works...)
<simbergm>
runtime pointer is needed for ^ checks
<jbjnr>
remove the checks and we can add new (improved?) ones if stuff fails?
hkaiser has joined #ste||ar
<jbjnr>
adding new thread pools after startup should be supported eventualy
<jbjnr>
but not in this incarnation
<hkaiser>
simbergm: not sure if we should always require passing a partitioner to init by the user
<hkaiser>
that sounds aweful
<jbjnr>
(not always. just when the user created one)
<simbergm>
hkaiser: no, optional of course
<jbjnr>
but he's right anout start/init having way too many overloads already
<jbjnr>
it's confusing even for us
<simbergm>
but making it optional adds a billion new overloads
<jbjnr>
why is the singleton a problem? becuase it uses the runtime ptr stuff?
<simbergm>
jbjnr: the singleton is not a problem, it's just ugly
<simbergm>
I'll remove the checks for now, the examples are the best documentation anyway and not following them is naughty...
<hkaiser>
simbergm: couldn't we have some 'global' sanity checkers?
<simbergm>
anyway, creating pools at runtime will need cooperation with the runtime to actually start the pools with the threadmanager
<simbergm>
hkaiser: yeah, that was my other suggestion
<simbergm>
wrap get_partitioner.create_pool() in hpx::create_pool which does the check
<simbergm>
or something like that...
_bibek_ has quit [Read error: Connection reset by peer]
<hkaiser>
right
_bibek_ has joined #ste||ar
_bibek_ has quit [Quit: Konversation terminated!]
_bibek_ has joined #ste||ar
_bibek_ has quit [Client Quit]
_bibek_ has joined #ste||ar
_bibek_ has quit [Quit: Konversation terminated!]
_bibek_ has joined #ste||ar
_bibek_ has quit [Client Quit]
_bibek_ has joined #ste||ar
_bibek_ has quit [Client Quit]
_bibek_ has joined #ste||ar
_bibek_ has quit [Quit: Konversation terminated!]
_bibek_ has joined #ste||ar
weilewei has quit [Remote host closed the connection]
_bibek_ has quit [Read error: Connection reset by peer]
_bibek_ has joined #ste||ar
_bibek_ has quit [Client Quit]
_bibek_ has joined #ste||ar
<hkaiser>
simbergm: when you were converting the qbk files to the new documentation format, did you use some tool?
_bibek_ has quit [Quit: Konversation terminated!]
_bibek_ has joined #ste||ar
_bibek_ has quit [Quit: Konversation terminated!]
_bibek_ has joined #ste||ar
K-ballo has joined #ste||ar
_bibek_ has quit [Client Quit]
_bibek_ has joined #ste||ar
<simbergm>
hkaiser: no, just an ad-hoc set of sed replacements for most of it, the rest manually
<simbergm>
I may still have it around
_bibek_ has quit [Client Quit]
_bibek_ has joined #ste||ar
<simbergm>
probably not it looks like.. :/
<hkaiser>
simbergm: no worries
<hkaiser>
and thanks for checking
<hkaiser>
heller: yt?
<hkaiser>
heller: where is the extra data item for pointer tracking enabled for output_archives?
<hkaiser>
do you lazily construct those nowadays?
<heller>
hkaiser: should be, yes
_bibek_ has quit [Read error: Connection reset by peer]
_bibek_ has joined #ste||ar
<hkaiser>
heller: what about if I want to be able to check whether certain extra data is supported by the archive?
<hkaiser>
i.e. 'does this archive support credit splitting'?
<hkaiser>
heller: ^^
<heller>
hkaiser: the extra data is not a property of the archive, I think
_bibek_ has quit [Client Quit]
<heller>
hkaiser: they are set by the objects you serialized to the archive
<hkaiser>
not necessarily
_bibek_ has joined #ste||ar
<hkaiser>
I might not want to do credit splitting in certain cases
<hkaiser>
heller: ?
<simbergm>
hkaiser: stackless threads is ready to go in right? ci looks very happy :)
<simbergm>
hkaiser: stackless threads is ready to go in right? ci looks very happy :)
<simbergm>
hkaiser: stackless threads is ready to go in right? ci looks very happy :)
<hkaiser>
simbergm: yah, let's go if jbjnr doesn't object, he was mumbling something about this
<K-ballo>
someone's excited about stackless threads...
<hkaiser>
_very_ excited
<hkaiser>
heller: I think the extra data items should be explicitly enabled dependeing on the context the archive is used
_bibek_ has quit [Quit: Konversation terminated!]
_bibek_ has joined #ste||ar
<heller>
hkaiser: the main motiviation behind doing it lazily is to avoid to pay the cost if the archive doesn't require it
<hkaiser>
understood
<heller>
hkaiser: I guess this discussion is in the context of checkpointing?
<hkaiser>
yes
<hkaiser>
I want to get back to this
<hkaiser>
it's sitting there for too long
<heller>
so, what should happen if an id_type is supposed to get checkpointed?
<hkaiser>
no credit splitting, at best the id should be saved verbatim
<simbergm>
I think my excitement was amplified somewhere along the way...
<simbergm>
I got the impression jbjnr was fine with it
<hkaiser>
simbergm: so let's do it
<simbergm>
I think we'll go ahead with the cmake branch now as well, hope things just keep working normally
<heller>
hkaiser: 1) Why no credit spliltting? Why isn't the component behind the GID not being kept alive when it is split? 2) Wouldn't a deep copy make more sense here?
<jbjnr>
if the stackless PR doesn't completely change all the threading and scheduler API, then go ahead
<hkaiser>
heller: what's the purpose of checkpointing an id_type - I think it's to safe the value of the id, if you want to store the thing it refers to use a client
<heller>
hkaiser: ok, isn't the point of a checkpoint to be able to restore it later on?
<hkaiser>
sure, but you might want to restore it to the same id
<hkaiser>
even if we do some special handling for id_types during checkpointing, i.e. deep save - how would I know inside the id_type::save() function what to do?
<hkaiser>
heller: ^^
<hkaiser>
this is some information that has to be associated with the archive
<heller>
How about you use the split_gid map after you serialized everything?
<hkaiser>
what should I do there?
_bibek_ has quit [Read error: Connection reset by peer]
<hkaiser>
the credit was split at that point, should I undo the splitting?
_bibek_ has joined #ste||ar
<heller>
hkaiser: I don't think so. If you want to restore it verbatim since you want to restore it with the old GID, you need to keep it alive as log as the checkpoint is alive. otherwise you will get into lifetime troubles.
<heller>
with a deep save, you can not easily reuse the same GID, so you perform a recursive deep save. You need to keep the split_gid map around to avoid duplicates, once you are done, you can undo the splitting
_bibek_ has quit [Quit: Konversation terminated!]
_bibek_ has joined #ste||ar
<heller>
the other option would be to attach an extra data for checkpointing, and try_get it when doing the id_type::save/id_type::load operation
<hkaiser>
heller: I agree with the split_map (or similar), but I don't agree with having to undo splitting just because we don't want to have a means of carrying context information in the archive
<heller>
or just disallow checkpointing of id_types...
<hkaiser>
even that needs detection
<heller>
if (split_gids != nullptr) throw ...;
<heller>
;P
<heller>
also, the split_gids should do that automatically and abort if the map hasn't been moved out
aserio has quit [Quit: aserio]
_bibek_ has quit [Quit: Konversation terminated!]
_bibek_ has joined #ste||ar
nikunj has joined #ste||ar
nikunj has quit [Remote host closed the connection]
<hkaiser>
heller: I'll use an extract tag type in the archive as extra data, no overhead
<hkaiser>
extra tag type*
<heller>
hkaiser: which policy are you going for now?
<hkaiser>
for now I'll store the gid_type verbatim, we can discuss this further and change it to deep-save later
<heller>
without keeping the component alive?
<hkaiser>
yes
<heller>
ugh
<hkaiser>
checkpoints live longer than components
<hkaiser>
could be stored in a file after all
<heller>
sure, but what's the point of restoring them the?
<heller>
then*
<hkaiser>
restoring the id_type?
<heller>
if the id_type is checkpointed, it will be restored, no?
<hkaiser>
to be able to use the same value down the road - Yorlik was requesting this
<heller>
yes sure
<heller>
but if the checkpoint outlives the component, where is the point
<hkaiser>
ok, ok - I'll do the deep save ;-)
<hkaiser>
same as for clients
<heller>
yes
<heller>
here is a suggestion: checkpoints do indeed keep the components alive. If you don't want that, we already have a policy for that: unmanaged id_types
<heller>
then you have to make sure to clean up your checkpoints after a while
<hkaiser>
no, I don't think this is a good idea
<hkaiser>
the things in the checkpoint are not components anymore
_bibek_ has quit [Quit: Konversation terminated!]
<heller>
right, a deep save is the only real option there
_bibek_ has joined #ste||ar
hkaiser has quit [Ping timeout: 250 seconds]
_bibek_ has quit [Quit: Konversation terminated!]
_bibek_ has joined #ste||ar
_bibek_ has quit [Client Quit]
_bibek_ has joined #ste||ar
simbergm has quit [Write error: Connection reset by peer]
hkaiser has joined #ste||ar
<hkaiser>
heller: we can't do a deep save :/
<hkaiser>
I'll simply prevent managed id_types from being checkpointed
aserio has joined #ste||ar
<heller>
hkaiser: why can't we do a deep save?
<hkaiser>
we don't have the type of the component
<heller>
virtual dispatch through component_base?
<hkaiser>
components don't have a virtual base, usually, should we really add that just for this?
weilewei has joined #ste||ar
<hkaiser>
anyways, gotta run...
hkaiser has quit [Client Quit]
rtohid has joined #ste||ar
rori has quit [Ping timeout: 245 seconds]
aserio has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
bibek has joined #ste||ar
bibek has quit [Client Quit]
bibek has joined #ste||ar
_bibek_ has quit [Ping timeout: 250 seconds]
bibek has quit [Client Quit]
bibek has joined #ste||ar
aserio has joined #ste||ar
_bibek_ has joined #ste||ar
_bibek_ has quit [Client Quit]
bibek has quit [Read error: Connection reset by peer]
_bibek_ has joined #ste||ar
_bibek_ has quit [Client Quit]
Coldblackice_ has quit [Ping timeout: 268 seconds]
_bibek_ has joined #ste||ar
_bibek_ has quit [Read error: Connection reset by peer]
_bibek_ has joined #ste||ar
_bibek_ has quit [Client Quit]
_bibek_ has joined #ste||ar
_bibek_ has quit [Client Quit]
_bibek_ has joined #ste||ar
_bibek_ has quit [Client Quit]
bibek has joined #ste||ar
bibek has quit [Client Quit]
bibek has joined #ste||ar
aserio has quit [Quit: aserio]
bibek has quit [Quit: Konversation terminated!]
bibek has joined #ste||ar
bibek has quit [Client Quit]
<hkaiser>
heller: btw, it's not a clang issue, gcc shows the same behavior - it has to be something on our end
<heller>
Did you check the cache alignment?
<heller>
Should be easy enough to check with a small testcase
<hkaiser>
not yet, but this is the prime suspect
<hkaiser>
another suspect would be the overaligned allocator
<hkaiser>
c++14 does not support that
<heller>
So you're saying that the extra alignment actually hurts performance?
<hkaiser>
heller: not the alignment itself, I think the allocator that has to ensure alignment might be slower
<heller>
That would suck
<heller>
Can you reproduce this on msvc as well?
<heller>
Would also be interesting what happens if libc++ was used
<hkaiser>
heller: this is libc++
<heller>
Oh, ok
<heller>
Strange that it also happens with gcc then
<hkaiser>
tcmalloc?
diehlpk has joined #ste||ar
<heller>
tcmalloc shouldn't be affected by the c++ std
<heller>
What does the cache line test give you?
diehlpk has quit [Ping timeout: 264 seconds]
<heller>
hkaiser: turns out that neither gcc nor clang his this :/