aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
david_pfander has joined #ste||ar
david_pfander has quit [Ping timeout: 268 seconds]
EverYoun_ has joined #ste||ar
EverYoung has quit [Ping timeout: 240 seconds]
EverYoun_ has quit [Ping timeout: 258 seconds]
parsa has joined #ste||ar
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
parsa has quit [Client Quit]
EverYoung has joined #ste||ar
parsa has joined #ste||ar
parsa has quit [Client Quit]
eschnett has joined #ste||ar
EverYoung has quit [Ping timeout: 240 seconds]
hkaiser has quit [Quit: bye]
twwright has quit [*.net *.split]
Vir has quit [*.net *.split]
twwright has joined #ste||ar
Vir has joined #ste||ar
jaafar has quit [Ping timeout: 240 seconds]
jaafar has joined #ste||ar
jaafar has quit [Ping timeout: 250 seconds]
hkaiser has joined #ste||ar
<hkaiser> denisblank: I'm using openblas or mkl (use vcpkg)
hkaiser has quit [Quit: bye]
jaafar has joined #ste||ar
jaafar has quit [Ping timeout: 240 seconds]
K-ballo has quit [Quit: K-ballo]
jaafar has joined #ste||ar
denisblank has quit [Quit: denisblank]
jaafar has quit [Ping timeout: 248 seconds]
jaafar_ has joined #ste||ar
jaafar_ has quit [Ping timeout: 260 seconds]
david_pfander has joined #ste||ar
david_pfander has quit [Ping timeout: 248 seconds]
pree has joined #ste||ar
jbjnr has joined #ste||ar
<jbjnr> msg nickserv identify webn0d3fr33
<jbjnr> <sigh>
<jbjnr> time for another password
<jbjnr> I blame firefox anyway
<github> [hpx] biddisco force-pushed fixing_2996 from 39d236d to f49573f: https://git.io/vF1uq
<github> hpx/fixing_2996 bf8b214 John Biddiscombe: Add missing noop_topology functions causing compilation error
<github> hpx/fixing_2996 f49573f John Biddiscombe: Throttling scheduler must be disabled if hwloc is not used
pree has quit [Ping timeout: 248 seconds]
pree has joined #ste||ar
<msimberg> jbjnr: nice, thanks for fixing that
<jbjnr> just waiting for circleci to give a green. I fixed the inspect errors earlier
<msimberg> ok, good
<msimberg> saw your comment on the hwloc leak as well
<msimberg> seems a bit strange if asan is indeed giving a false positive there...
<heller> i haven't encountered a false positive with asan so far
<heller> msimberg: how would I reproduce the issue?
<jbjnr> I'll have another look. I only spent 5 mins on it last night
<jbjnr> you edited the issue? what changed?
<heller> having a shared_ptr to a pointer also sounds a bit odd ;)
<heller> msimberg: I see the problem though
<msimberg> jbjnr: I just realized the first and second stacktraces were exactly the same
<msimberg> except for the size of the leak...
<heller> msimberg: in hwloc_topology_info.cpp line 757
<heller> cpuset gets allocated but never freed
<jbjnr> oops
<msimberg> jbjnr: are you fixing it? :)
<jbjnr> hwloc_bitmap_free(cpuset);
<msimberg> i'll add it
<jbjnr> I wish hwloc was written in c++
<jbjnr> so much pointless copying of masks and alloc free pairs that are a total waste of everything
<heller> msimberg: checking out the throttle test right now
<heller> throttle fixes
<msimberg> heller: thanks
jaafar_ has joined #ste||ar
pree has quit [Quit: Bye dudes]
<heller> msimberg: you will become our new release manager, right?
<msimberg> heller: yes, looks like it
<msimberg> heller: and since you brought it up, you are mostly responsible for this, right?
<msimberg> how much of that is actually in progress, and of the stuff that's in progress how much needs to be done for the release?
<heller> msimberg: I think we should shift it to the next release
<heller> we don't have free cycles to work on it atm
<msimberg> okay
<msimberg> so what I'd like to do is clean up the 1.1 milestone of things like this, I don't quite like that everything is there
<msimberg> do you think it would be possible for me to get access to do that?
<heller> I'd really like to redesign our whole parcelport and serialization stuff
<msimberg> I can't judge for everything, but with enough asking around...
<heller> i've been thinking a lot about it lately
<heller> msimberg: yes, I can grant you access
<heller> msimberg: I suggest to go through the tickets, and either add a comment or judge by yourself
<msimberg> thank you, I promise to be responsible
<heller> just sent you an invite
<msimberg> yeah, that was my plan, please tell me if I misjudge something badly
<heller> sure
<heller> i'll keep an eye on it ;)
<heller> revoking access is easy and nothing gets lost ;)
<jbjnr> heller: "I'd really like to redesign our whole parcelport and serialization stuff" please be aware that I have a large body of work on the rma_objects that I wan to merge in. it includes new serialization stuff and rma allocators and everything. don't make change without warning me so I can merge first.
<heller> jbjnr: I won't
<jbjnr> ta
<heller> jbjnr: I plan to do the whole redesign in a seperate project at first to see how it goes
<jbjnr> what would you like to change?
<heller> the serialization process and how remote futures are triggered
<heller> I think there is a great opportunity there to have some speedup
<heller> in a nutshell
<heller> I think there is a lot that gets lost in the whole setup
<heller> So I want to have a parcelport/parcelhandler setup, that is completely indepedentant of the whole HPX tasking framework at first
<jbjnr> (btw - My next parcelport work will focus on getting the rma stuff into channels and adding the collectives)
<heller> yeah, collectives and point to point communication is another thing
<heller> so one thing to have first, I guess, is to deal with the message passing in a more explicit way, allowing for HPX (and possibly other task based systems) to reuse it efficiently
<heller> and then we can evaluate performance of the whole networking layer more realibly without anything else getting in the way
<heller> for example, design it with communicators in mind from the ground up
<heller> no GIDs, just endpoints to which you can dispatch functions to or send messages
<heller> if that makes sense
<heller> msimberg: did you run the whole testsuite with your changes in the throttle fix branch?
<msimberg> heller: no, I did not
<msimberg> I can still do that
<heller> I am on it atm
<msimberg> ah, okay, thanks
<heller> getting lots of failures
<msimberg> :(
<msimberg> okay, let me know which ones
<heller> not sure if it is related to all the other failures we have right now :/
<heller> especially #2982, #2998 and #3007
<heller> merging it to my branch atm...
<heller> and try again...
<github> [hpx] msimberg opened pull request #3011: Fix cpuset leak in hwloc_topology_info.cpp (master...fix-hwloc-leak) https://git.io/vFMY7
<heller> pretty interesting
<heller> msimberg: merging the mentioned PRs and your throttle fixing stuff together gives a pretty nice picture!
jaafar_ has quit [Ping timeout: 240 seconds]
<heller> msimberg: still not perfect though ;)
<heller> msimberg: https://gist.github.com/sithhell/d9f1dbab43398ab08fd71723b68bcdd8 <-- those are the tests that fail for me
<msimberg> heller: thanks for checking! I'll try to fix those (and run all the tests this time)
<heller> msimberg: I am on it as well
<github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/vFMcC
<github> hpx/gh-pages e541ac1 StellarBot: Updating docs
<msimberg> heller: okay, let me know when you find something out, will have a look myself anyway
<heller> msimberg: I am getting a strange assertion for thread creation right now
<heller> checking where that comes from...
<heller> right here
<msimberg> heller: hmm, interesting
<msimberg> does that happen all the time? on a specific test?
<heller> that's with the action_invoke_no_more_than_test
<heller> not sure if it happens on master as well...
<heller> let me check...
<msimberg> do you know if this was my branch only or also on master? if not I will check
<heller> i'll check
<msimberg> too slow, ignore that
<heller> doesn't seem to happen on master
<msimberg> okay, then I've messed something up, will look into it
<msimberg> heller: on a related note, is there a purpose to the thread_map_ besides being able to check how many threads there are in various states?
<heller> msimberg: I used to know that ...
<heller> msimberg: TBH, I've been tinkering on how to remove it as well for some time now
<heller> one purpose is that it keeps the thread alive until the cleanup is done
<heller> msimberg: got it
<heller> or not...
<jbjnr> I'll merge #3009 as soon as someone reviews it
<msimberg> heller: okay, the threads would go into the terminated_items queue anyways
<heller> yeah
<heller> I don't remember the exact road blocks I ran into ...
<msimberg> so got it was a false alarm? ;)
<heller> yes
<heller> msimberg: ahh, one road block was "abort_all_suspended"
<heller> msimberg: thread_queue.hpp line 1057
<msimberg> heller: yep, that's a problem
<heller> that's *the* problem ;)
<msimberg> I guess one could have some more queues for different states
<msimberg> but maybe it's just moving a problem from one place to another
<heller> yeah...
<heller> that's the usual effect we see, you eliminate one contention point and it moves to another one
<msimberg> mmh... tricky
<heller> yup
<heller> take a look at the video I just posted, and this one: https://www.youtube.com/watch?v=lZU6RK0oazM
<heller> hartmut discovered USL a while back, but we didn't really do anything with it
<heller> after watching those talks, I think we should get back to it and try to adapt to our thread scheduling
<heller> jbjnr: did you test with having hwloc disabled and tried to run hello world or so?
<jbjnr> apparently not. terminate called after throwing an instance of 'std::length_error'
<jbjnr> what(): vector::reserve
<jbjnr> but that has nothing to do with my changes.
<jbjnr> PITA
<jbjnr> I suspect that it does not work at all without hwloc now since the RP stuff <cough>
<jbjnr> let's not make hwloc an option. it isn't worth fixing it
<heller> jbjnr: right, that's why I made that comment on the issue ;)
<msimberg> heller: thank you, for the assert I suppose?
<heller> msimberg: yes, this fixes the problems I ran into
<heller> except for the this_thread_executor
<jbjnr> this_thread_executor : what is it for?
<heller> jbjnr: execute stuff on this thread
<jbjnr> indeed. that much is clear from the name. but why does it exist.
eschnett has quit [Quit: eschnett]
<heller> jbjnr: ask hartmut
<heller> msimberg: fixed it, I think
<jbjnr> <sigh>
<github> [hpx] biddisco pushed 2 new commits to master: https://git.io/vFMyt
<github> hpx/master 2608da6 Mikael Simberg: Fix cpuset leak in hwloc_topology_info.cpp
<github> hpx/master 660def2 John Biddiscombe: Merge pull request #3011 from msimberg/fix-hwloc-leak...
<heller> jbjnr: concurrency vs. parallism I think
<heller> jbjnr: you can asynchronously launch tasks on the current thread without having parallelism but concurrency
<msimberg> heller: how... you're much too fast (I'm slowly building tests...)
<msimberg> but I think that gist is the same as the previous one?
<jbjnr> users are not supposed to use threads directly ...
<msimberg> ah, I see now, 2 revisions... thanks again
<heller> ahh, almost ...
<heller> the timed version is still failing :/
K-ballo has joined #ste||ar
<heller> msimberg: did you touch the state_suspended vs. state_stopped thing?
<msimberg> heller: not on this branch, I have in my experiments with suspending though
hkaiser has joined #ste||ar
hkaiser has quit [Quit: bye]
<heller> msimberg: ok
mcopik has joined #ste||ar
mcopik has quit [Client Quit]
Hodor12345678 has joined #ste||ar
<Hodor12345678> Hello Everyone
<K-ballo> hi there
Hodor12345678 has quit [Remote host closed the connection]
<heller> that was quick ;)
<jbjnr> Hold the Door!
<heller> msimberg: closing in now...
<heller> msimberg: I have another patch that needs to be integrated into the fix-throttle-test
K-ballo has quit [Read error: Connection reset by peer]
K-ballo has joined #ste||ar
<msimberg> heller: can I see? I'm just trying to understand what the thread pool executor is actually doing, so I'm afraid I'm of not much help (yet)
<heller> msimberg: still running into hangs at shutdown ...
<heller> msimberg: in essence, the thread pool executors run an embedded scheduling loop
<msimberg> heller: do you have a concise explanation of how it stops? is the destructor supposed to block until all the work on the thread pool executor is done?
<heller> msimberg: yes. the problem is, that it stops too early
<msimberg> yeah, the cleanup_terminated functions are most likely too relaxed now
<msimberg> first try with adding back checks for thread_map_.empty() at least hangs
<msimberg> and btw, thread_map_count_ should always be the same as thread_map_.size(), no?
<heller> yes
<heller> well, there is a race, but that should be accounted for, that is, thread_map_count_ should only be increased once the item has been inserted, and decreased once it has been removed
<msimberg> sure, but besides that
<msimberg> so what is the purpose of having a separate count variable then?
<heller> optimization
<heller> checking an atomic is cheaper then acquiring a contented lock
<msimberg> mmh, I see
<heller> OTOH, we could turn it around and do a try lock instead... if that doesn't succeed, we assume there is enough work or equivalent for other functions
<K-ballo> heller: what's holding the component factory removals?
david_pfander has joined #ste||ar
<msimberg> heller: going to stop for today and continue tomorrow, but I found at least that it's only the test_timed_apply part that fails
<msimberg> and at the time of the assert its 4 completed, 6 scheduled, so those might be the two missing
<msimberg> do post_after/at do something differently than async/sync wrt the thread map or something?
<msimberg> anyway, will continue tomorrow, thanks for finding the problems
<heller> msimberg: the patch I sent in is needed. I was looking at the wrong spot though. I know what to do know...
david_pfander has quit [Ping timeout: 240 seconds]
EverYoung has joined #ste||ar
jaafar_ has joined #ste||ar
gedaj has joined #ste||ar
david_pfander has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
gedaj has quit [Quit: leaving]
david_pfander has quit [Ping timeout: 240 seconds]
gedaj has joined #ste||ar
akheir has joined #ste||ar
aserio has joined #ste||ar
aserio has quit [Client Quit]
jaafar_ has quit [Ping timeout: 240 seconds]
jaafar_ has joined #ste||ar
daissgr has joined #ste||ar
hkaiser has joined #ste||ar
daissgr has quit [Quit: daissgr]
daissgr has joined #ste||ar
<github> [hpx] hkaiser pushed 1 new commit to optional: https://git.io/vFDKw
<github> hpx/optional fcc888b Hartmut Kaiser: Adding missing #include
daissgr has left #ste||ar [#ste||ar]
hkaiser has quit [Quit: bye]
daissgr has joined #ste||ar
daissgr has left #ste||ar [#ste||ar]
daissgr has joined #ste||ar
daissgr has left #ste||ar [#ste||ar]
daissgr has joined #ste||ar
gedaj has quit [Quit: leaving]
gedaj has joined #ste||ar
gedaj has quit [Client Quit]
gedaj has joined #ste||ar
EverYoun_ has joined #ste||ar
gedaj has quit [Client Quit]
gedaj has joined #ste||ar
EverYoung has quit [Ping timeout: 255 seconds]
gedaj has quit [Client Quit]
gedaj has joined #ste||ar
gedaj has quit [Client Quit]
gedaj has joined #ste||ar
gedaj has quit [Client Quit]
gedaj has joined #ste||ar
gedaj has quit [Client Quit]
gedaj has joined #ste||ar
gedaj has quit [Client Quit]
gedaj has joined #ste||ar
eschnett has joined #ste||ar
gedaj has quit [Quit: leaving]
david_pfander has joined #ste||ar
david_pfander has quit [Ping timeout: 248 seconds]
daissgr has quit [Quit: WeeChat 1.4]
daissgr has joined #ste||ar
daissgr has quit [Client Quit]
daissgr has joined #ste||ar
akheir has quit [Remote host closed the connection]
gedaj has joined #ste||ar
gedaj has quit [Client Quit]
parsa has joined #ste||ar
gedaj has joined #ste||ar
gedaj has quit [Quit: leaving]
hkaiser has joined #ste||ar
mbremer has joined #ste||ar
gedaj has joined #ste||ar
mbremer has quit [Quit: Page closed]
EverYoun_ has quit [Remote host closed the connection]
gedaj has quit [Quit: leaving]
jbjnr has quit [Read error: Connection reset by peer]
EverYoung has joined #ste||ar
EverYoun_ has joined #ste||ar
EverYoung has quit [Ping timeout: 240 seconds]
gedaj has joined #ste||ar
gedaj has quit [Client Quit]
gedaj has joined #ste||ar