aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
<K-ballo>
oh I think I know what's going on
<K-ballo>
oh boy, I'm rusty
<hkaiser>
do you now?
<K-ballo>
it's not crashing... it's not doing anything...
<K-ballo>
and now VS is busy
<K-ballo>
ok, there we go
<K-ballo>
hkaiser: yes, the security cookie is overwriting the exception object somehow
<K-ballo>
unless that, no... let me look again
<github>
[hpx] hkaiser force-pushed boost_date_time from c044856 to 06691b7: https://git.io/v7JDi
<github>
hpx/boost_date_time 06691b7 Hartmut Kaiser: Removing dependency on Boost.Date_Time
<hkaiser>
K-ballo: they overwrite the local stackframe with cc but that goes too far
<K-ballo>
yes, that's what I'm seeing too :|
<hkaiser>
ok, at least it's not just me
<K-ballo>
someone said MSVC keeps exception objects on the stack recently?
<hkaiser>
but this is a pointer
<K-ballo>
so my get_exception_info is returning a pointer to a stack object
<hkaiser>
so you're returning a pointer to a local variable which then gets overwritten when the stack is reused
<K-ballo>
can't do that.. I have to do the calls from within the catch
<hkaiser>
so no bad codegen
<K-ballo>
I don't know.. is this conforming?
<hkaiser>
starts to make sense
<hkaiser>
shrug
<hkaiser>
since when do they care whether something is conforming
<K-ballo>
I care
<K-ballo>
did I make a mistake? or did they mess up again?
<hkaiser>
sure, me too
<K-ballo>
anyways, I'll prepare a fix/workaround
<hkaiser>
but this breaks our code :/
<hkaiser>
thanks!
<K-ballo>
I basically have to get rid of that overload :'(
<hkaiser>
right
<K-ballo>
fun
<K-ballo>
I
<K-ballo>
it'll mess up all those other functions
<hkaiser>
you'll have to go back to what it was before... have the try catch in every access function
<K-ballo>
yeah... I had such a clean design
<hkaiser>
right
<hkaiser>
K-ballo: btw, removing boost time was easy, we were using it in very few spots only
<hkaiser>
date_time*
<K-ballo>
nothing asio related?
<hkaiser>
replaced the deadline_timer
<hkaiser>
asio has a awaitable_timer which works with chrono and is otherwise the same
jgoncal has quit [Quit: jgoncal]
jgoncal has joined #ste||ar
<K-ballo>
oh F..
<K-ballo>
the error_code overloads too work by exception_ptr
<hkaiser>
K-ballo: yah you removed all those functions...
<hkaiser>
to get rid of the code duplication
<K-ballo>
this is not ok...
EverYoung has quit [Ping timeout: 246 seconds]
<K-ballo>
I'll need to think about this some more
<K-ballo>
I can still look at the default constructor issue I guess
jgoncal has quit [Quit: jgoncal]
<K-ballo>
hkaiser: the default constructor thing is a different issue, the implementation behind the macros just forgot to pass those arguments
<K-ballo>
the `construct_[lightweight_]exception` functions
<K-ballo>
uhm, they didn't "forget", those functions are `std::string`s
mars0000 has quit [Quit: mars0000]
<K-ballo>
I'll remove the embedded attributes, they are misleading since we won't be using them, there's separate tag for those.. and I'll implement some kind of visit_with_xi for the rest
eschnett has joined #ste||ar
<hkaiser>
K-ballo: sounds good
hkaiser has quit [Quit: bye]
mars0000 has joined #ste||ar
patg has joined #ste||ar
patg is now known as Guest30300
Guest30300 is now known as patg
patg has quit [Quit: This computer has gone to sleep]
patg has joined #ste||ar
patg is now known as Guest2481
K-ballo has quit [Quit: K-ballo]
Guest2481 has quit [Client Quit]
patg_ has joined #ste||ar
patg_ has quit [Client Quit]
eschnett has quit [Ping timeout: 248 seconds]
jgoncal has joined #ste||ar
eschnett has joined #ste||ar
taeguk has joined #ste||ar
jgoncal has quit [Quit: jgoncal]
mars0000 has quit [Quit: mars0000]
bikineev has joined #ste||ar
bikineev has quit [Remote host closed the connection]
<jbjnr>
still a bit crappy for the 512 block stuff
<jbjnr>
I staill have stuff to fix though, so there's hope
<taeguk>
Is there way to get exception_list from vector of futures?
<jbjnr>
taeguk probably not directly, but if one of them throws during a when_all .. then you should be able to find which ...
<jbjnr>
I suspect you'll have to loop over them and test each to see if it has an exception - not sure how you would do that actually
<jbjnr>
taeguk today we are suposed to hav a call aren't we? I forgot again. I am a terrible mentor this year.
<jbjnr>
did you write the report that hartmut asked for?
<taeguk>
jbjnr: about report, not yet. I'll write the report soon.
<jbjnr>
do you need a call today?
<taeguk>
As I think, skipping today is good to me.
<taeguk>
I will submit PR about parallel::partition soon.
<heller>
jbjnr: most awesome job you did there!
<jbjnr>
?
<jbjnr>
where?
Matombo has joined #ste||ar
<heller>
jbjnr: the scheduler stuff
<heller>
jbjnr: scheduler, PP, you are really a true hero ;)
<jbjnr>
If only the people I work with thought that :)
<jbjnr>
(I'm not done with the scheduler, want to add better numa handling). Have to redo a bunch of stuff first though so that the schedulers get better pu information. They only get thread indexes at the moment and the resource partitioner complicates it. Need to re-simplify it again.
<jbjnr>
taeguk if you are happy and do not need a gsoc call, then I am happy to skip it today. Thanks.
<jbjnr>
(and sorry for not helping you much)
<heller>
jbjnr: I know your pain :/
<taeguk>
Excuse me, Must I use old template '<>' like std::vector<std::pair<int, std::unique_ptr<int> > > ?
<taeguk>
Using just like std::vector<std::pair<int, std::unique_ptr<int>>> is not admitted for compiler compability?
<jbjnr>
I think most compilers wil handle >>> now
<jbjnr>
heller: knows if we can use >>> or > > >
<heller>
>>> should be fine
<heller>
clang-format it all the way
<heller>
man ... something is *very* wrong on that platform...
<github>
[hpx] biddisco created fix_logging (+1 new commit): https://git.io/v7UIm
<github>
hpx/fix_logging 5814e72 John Biddiscombe: Fix compilation when HPX_WITH_LOGGING is OFF
<jbjnr>
which platform?
<heller>
meggie
<heller>
where I want to do the omnipath tests
<jbjnr>
ah
Matombo has quit [Remote host closed the connection]
Matombo has joined #ste||ar
Matombo has quit [Remote host closed the connection]
<jbjnr>
I just cannot believe these parsec results.
<jbjnr>
heller: https://pasteboard.co/GBWBC08.png this is what we are up against. (I've added the parsec results for the same config to my plot)
<heller>
oO
<heller>
parsec uses starpu, right?
<heller>
and offline scheduling?
<jbjnr>
parsec just uses parsec as far as I understand. I don't know much about it really.
<heller>
hmm
<jbjnr>
but yes, static scheduling up front
<heller>
would be interesting how their task traces look
<jbjnr>
the internet is ful of their presentations, plenty of pics
<heller>
I think we can create something very similar than static, up-front scheduling with the help of resource manager and different executors
<jbjnr>
the problem is that once we go Node>1 the gap gets much wider.
<heller>
i meant for that specific application ;)
<jbjnr>
(between them and us)
<heller>
hmmm
<heller>
so you use parsec for the off-node communication as well, but MPI for the HPX stuff?
<jbjnr>
don't follow you. parsec uses mpi internally, and ours is using mpi
<heller>
ok
<heller>
but there is the difference
<jbjnr>
anyway. I'm, not done yet.
<heller>
for parsec, you stay completely inside the parsec programming model
<heller>
for HPX, you switch back to the MPI paradigm
<heller>
that's not a fair comparison
<jbjnr>
it's the job I've been given.
<jbjnr>
MPI is here to stay and unless I change jobs, I have to make MPI work for us
<heller>
but why does this not count for parsec?
<jbjnr>
nobody (and I really mean nobody) is iknterested in async(locality, blah).
<jbjnr>
"but why does this not count for parsec?" explain ?
<jbjnr>
(I did not understand your question)
<heller>
you are saying that you use the parsec all the way, even for distributed, or did i get this wrong?
<heller>
that is, the parsec runtime is taking care of sending/receiving the messages
<jbjnr>
yes. correct.
<jbjnr>
And don't get me wrong. nobody wants parsec either. but , wwe still have to get as close as possible
<heller>
sure, I understand that
<heller>
I just fail to see the validity of this approach
<jbjnr>
validity of what approach?
<heller>
on the one hand, you are using a new programming model all the way (with all its capabilities, including distributed). On the other hand, you are trying to get the same performance with MPI + X. Where X is happening to be HPX
<heller>
and with you, I meant your boss ;)
<jbjnr>
We have millions of lines of code from existing projects. HPX needs to work WITH them and not INSTEAD of them. This is not an unreasonable request.
<heller>
so in distributed, you are essentially comparing MPI against PARSEC ;)
<jbjnr>
don't be silly
<jbjnr>
anyway. lunch now
<jbjnr>
bbiab
<heller>
sure, I get that. what I don't get is why parsec isn't measured with the same criterions
<jbjnr>
it is
<heller>
ok, then I am clearly not following ;)
<heller>
just for my full understanding: You are comparing a PARSEC (where PARSEC happens to use MPI underneath for communication) application with a MPI + HPX application
<heller>
right?
<heller>
just out of curiosity, for the parsec results, do you include the time it takes to create the DAG?
<jbjnr>
no
pree has joined #ste||ar
<heller>
jbjnr: would be interesting how much that contributes
<heller>
jbjnr: you know ... I hate libfabric :/
<jbjnr>
what's wrong now?
<heller>
I am getting a memory corruption when calling fi_recv
<heller>
where I pass "this" as a context.
<heller>
coincidentally, they internally overwrite the pointer passed :/
<jbjnr>
sounds suspicious
<heller>
totally fucked up
<heller>
you really wonder how it can be that *any* of their tests pass
<jbjnr>
heller: I have an rdma branch that I should push before you do a lot of work on the PP stuff
<jbjnr>
quite a few changes to memory registration etc, pools, rma_objects all that stuff from the paper
<heller>
ok, great
<heller>
right now, I am fixing libfabric itself...
<jbjnr>
I'd better rebase it onto latest master
<jbjnr>
I wanted to redo the verbs PP to sit on top of the new stuff, but that's a bigger job :(
<heller>
jbjnr: if you push your stuff, I can take care of that
<jbjnr>
I have cleaned up the branch, but there are a bunch of WIP: commits that are not intended to be used for real, they are just remporary
<hkaiser>
sure, no worries
<jbjnr>
I rebased them to the end so we can remove them easily.
<hkaiser>
tks for you effort on this
<jbjnr>
boost::asio::io_service ??? wtf, can we get rid of this and just use the new resource_partitioner pol create and executors for that?
<jbjnr>
so timer pool and io_pool just become normal pools
<jbjnr>
and we use the new pool_exeecutor
<jbjnr>
or do we need extra asio features?
<hkaiser>
those don't have to be hpx thread managers, though
david_pfander1 has joined #ste||ar
<hkaiser>
they create thir own threads which are driving the asio io-services
<hkaiser>
jbjnr: also, I'm not sure all of them need dedicated cores
<jbjnr>
so we would want the RP pol create, but not the schedulers and thread management, ok . I leave that alone for now
<jbjnr>
yes - dedicated cores, we have not added it yet, but in the pool create, we will want "exclusive" flags and suchlike to denote a pool use of PUs etc
<jbjnr>
if the pool can share PUs with other threads etc.
<hkaiser>
nod
<jbjnr>
so mpi pool, io pool, timer pool, would probably be fine as sharing stuff, but matrix work needs exclusive access
<jbjnr>
look at examples/resource_partitioner for the simple test
<hkaiser>
jbjnr: the mpi pools are different, no?
<jbjnr>
on that branch
<hkaiser>
isn't mpi using hpx threads nowadays?
<jbjnr>
our mpi-pool so far is identical to a normal hpx pool, but all tasks that do comms are sent via executor to that pool
<jbjnr>
(when debugging completed)
<hkaiser>
ok
<jbjnr>
I am working heavily on that branch at the moment, so please bear in mid frequent rebases and much disruption is ongoing
<hkaiser>
sure, no worries
<hkaiser>
I would like to understand what you're doing there, mostly
<heller>
I suspect it is the offline-scheduling from parsec
<heller>
(to some extent)
<jbjnr>
yes, better numa awareness, my work on that is ongoing. I am changing the schedulers as we speak
<hkaiser>
jbjnr: no need to change the schedulers for numa awareness
<jbjnr>
(cache awareness)
<jbjnr>
why not?
<hkaiser>
why?
<hkaiser>
how?
<hkaiser>
the schedulers already allocate all the memory they need on the cores they are running on
<jbjnr>
we you can see that my scheduler is doing quite a bit better than the local_priority_scheduler in several places, but I have not added the correct numa checks to it yet for stealing.
<hkaiser>
what did you change their?
<hkaiser>
there*
<jbjnr>
task placement and stealing
<jbjnr>
mostly
<hkaiser>
ahh
<jbjnr>
and HP queue usage
<heller>
how I love fixing C code ...
<hkaiser>
that's not really scheduler changes
<hkaiser>
that's how you use them
<jbjnr>
well, when the scheduler has an API that says add task, get_next_task, etc etc. I changed the code in there. Call it whatever you like :)
<heller>
jbjnr: did you measure cross NUMA traffic?
<hkaiser>
heller: c code is lovely - no exceptions, no type safety, no templates, no destructors ... life can be so easy
<hkaiser>
jbjnr: ok
<jbjnr>
note that in the examples/resource_partitioner, there is a test scheduler, that can be assigned to a pool at run time. This is awesome cos I can experiment without full rebuilds.
<hkaiser>
nice
<jbjnr>
heller: measuring NUMA. still working on that. could not get papi stuff to work yet
<heller>
hkaiser: exactly! Except when you curropt memory that has been passed. I really wonder how *any* of the libfabric tests actually succeed ...
<heller>
jbjnr: you could likwid for that as well
<heller>
to get a first idea
<heller>
then you can easily compare numa traffic for parsec and hpx
<hkaiser>
jbjnr: I created a branch yesterday documenting the papi counters (a bit), and other minor changes
<jbjnr>
hkaiser: thanks I am using it already, but no luck yet, distracted by bugs etc in my stuff
<hkaiser>
k
<hkaiser>
let me know if you need help
<hkaiser>
heller: I got rid of date_time yesterday
<jbjnr>
well, when I build with ppi and try to access a perf-counter, it just throws on me, so I guess I screwed up.
<jbjnr>
^papi
<hkaiser>
jbjnr: HPX's exception handling might be broken, currently
<hkaiser>
jbjnr: k-Ballo is working on a fix
<jbjnr>
ok. I am busy anyway. Will wait a it before retrying.
<heller>
hkaiser: nice!
<heller>
hkaiser: except that it is not entirely complete ;)
<heller>
hkaiser: circle-ci is complaining
<hkaiser>
is it? - stupid thing ;)
<hkaiser>
it was too late, I didn't even look anymore
<hkaiser>
heller: ok, stupid me - missed that spot
<github>
[hpx] biddisco opened pull request #2775: Fix compilation when HPX_WITH_LOGGING is OFF (master...fix_logging) https://git.io/v7UgS
K-ballo has joined #ste||ar
david_pfander1 has quit [Ping timeout: 248 seconds]
<github>
[hpx] hkaiser force-pushed boost_date_time from 06691b7 to cb39cdc: https://git.io/v7JDi
<github>
hpx/boost_date_time cb39cdc Hartmut Kaiser: Removing dependency on Boost.Date_Time
eschnett has quit [Quit: eschnett]
<github>
[hpx] hkaiser force-pushed preprocessor from bcfc124 to c622d6b: https://git.io/v7U2d
<github>
hpx/preprocessor c622d6b Hartmut Kaiser: Adding inspect checks for HPX macros/related includes
<github>
[hpx] biddisco created fixing_compiler_check (+1 new commit): https://git.io/v7U2b
<github>
hpx/fixing_compiler_check 4708006 John Biddiscombe: Fix a bug in compiler version check
<github>
[hpx] biddisco opened pull request #2776: Fix a bug in compiler version check (master...fixing_compiler_check) https://git.io/v7Uat
<github>
[hpx] biddisco created rdma_object (+15 new commits): https://git.io/v7UaY
<github>
hpx/rdma_object 9467da7 John Biddiscombe: Fix bad size and optimization flags during archive creation
<github>
hpx/rdma_object 3ceabdb John Biddiscombe: Add a customization point for put_parcel so we can override actions (e.g. rdma)
<github>
hpx/rdma_object 1d54904 John Biddiscombe: WIP: fix to enable compilation of verbs PP on laptop
<jbjnr>
heller: that branch should be used for ref only. I have not done cleaning it up, but it has the bulk of the rma serialization work on it and the rma::object stuff.
<jbjnr>
I'll try to clean it fully over the weekend
<heller>
jbjnr: ok, I'll try to fix the PSM2 provider in the meantime
<jbjnr>
no, I had a stack overflow message and was 'amazed' - if it actually works in real test code, then it's a game changer - the message I got appeared to be caused by a spurious segfault from some other memory corruption - but it's progress
<jbjnr>
I have been a victim of undiagnosed stack oerflows on numerous occasions
denis_blank has quit [Ping timeout: 260 seconds]
denis_blank has joined #ste||ar
<ABresting>
can you name me a few use cases, where I can test and find out if mine will work with it?
<ABresting>
jbjnr: just the problem statement and I will write it up and test
<ABresting>
jbjnr: or will it be a good idea if you could add a few test cases in the repo so that any code I write will be tested perfectly?
<ABresting>
co routine stack overflow detection of libsigsegv is messed up, I am working to set it straight :P
<zao>
And none of those faults are JB's fault :P
denis_blank2 has joined #ste||ar
denis_blank has quit [Ping timeout: 255 seconds]
<ABresting>
I hope not :P
<ABresting>
jbjnr: "the message I got appeared to be caused by a spurious segfault from some other memory corruption", what exactly happened there?
<jbjnr>
ABresting: all one needs to do is create afunction with a large local array as a variable - std::array<int, 65536> and then call that using async - if you write stuff into the local array - bang - stack overflow.
<jbjnr>
if the async call uses a policy with small_stack_size - then it should be easy to test for a fail.
<zao>
jbjnr: I've done that with Asio, great fun :D
aserio has joined #ste||ar
denis_blank2 has quit [Ping timeout: 240 seconds]
<ABresting>
jbjnr: Thanks and noted, it seems like a standard one, also if there is any special use case you have witnessed then feel free to mention, otherwise former looks like to introduce an alternate handler stack in async. Let me find out how it turns out.
hkaiser has quit [Quit: bye]
ajaivgeorge__ has quit [Ping timeout: 240 seconds]
denis_blank has joined #ste||ar
<K-ballo>
strange things are afoot at the Circle-CI today
<jbjnr>
ABresting: the problem usually occurs when you call some function in some library written by someone else - and it has a ton of temp vars that need a large stack - mostly the code we write ourselve, we know about - but especially in math libraries and suchlike one can bump into unexpected problems of that kind
* zao
has definitely not put 2*12*1024 floats on the stack before by accident.
<zao>
Nope. Never.
hkaiser has joined #ste||ar
denis_blank has quit [Quit: denis_blank]
jgoncal has quit [Ping timeout: 268 seconds]
jgoncal has joined #ste||ar
jgoncal has quit [Ping timeout: 240 seconds]
eschnett has quit [Quit: eschnett]
Matombo has quit [Remote host closed the connection]
akheir_ has quit [Remote host closed the connection]