#ste||ar on 2019-03-08 — irc logs at irclog.cct.lsu.edu

2018-08-26 23:03 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/

00:00 <Yorlik> like put a global mutex around every single access?

00:00 <heller_> yes, for example

00:00 <heller_> not a global one though

00:00 <zao> heller_: A quick question before I hit the sack - I'm not completely familiar with the gitlab CI image, I can't quite tell from the outside what SC++L that the sanitizer branch ends up using with sanitizers - is it libstdc++ or libc++?

00:00 <Yorlik> I was thinking about using that pattern with std containers

00:00 <heller_> zao: libc++

00:00 <zao> ah

00:00 <zao> thanks

00:00 <heller_> zao: sorry, libstdc++

00:01 <heller_> my bad

00:01 <zao> Oh :)

00:01 <heller_> not the llvm one ;)

00:01 <Yorlik> heller_: why not a global one? I meant a mutex used by all threads for reading and one for writing

00:01 <zao> Swamped with work at work building the competition so haven't gotten around trying the branch yet.

00:02 <heller_> Yorlik: class concurrent_vector { std::vector<T> data; mutex mtx; void push_back(...) { std::lock_guard<...> l(mtx); data.push_back(...); } };

00:02 <heller_> Yorlik: along those lines

00:02 <heller_> Yorlik: your attempt won't work

00:02 <heller_> zao: starpu, i guess?

00:03 <heller_> Yorlik: mutex don't compose

00:03 <zao> Aye.

00:03 <heller_> what a relief ;)

00:03 <Yorlik> heller_: Using a wrapper class?

00:03 <heller_> Yorlik: yes

00:03 <zao> The group using it found out today that they were using a criminally broken toolchain.

00:04 <zao> We had apparently forgotten to hide out experimental "what happens if we run the latest GCC+CUDA+OpenMPI versions" test toolchain :D

00:04 <Yorlik> heller_: You say it could work or not work? My idea was to use a wrapper guarding the push_backs and stuff

00:15 <heller_> zao: nice

00:16 <heller_> zao: what are you using as for CI?

00:16 <heller_> Yorlik: yes sure, but keep the synchronization local to that class

00:17 <heller_> Yorlik: and you can't use two different mutex to guard write or read accesses

00:17 <Yorlik> Yes, that was the plan.

00:17 <Yorlik> Right - makes sense.

00:17 <heller_> Yorlik: you also have to watch out for things like pointer/reference stabilities and such

00:18 <Yorlik> You mean guarding my references are not destroyed by any reordering operations?

00:18 <Yorlik> I was pondering to use handles

00:19 <heller_> things like push_back and friends might invalidate any pointers into your container

00:19 <Yorlik> push_back?

00:19 <Yorlik> You mean if the allocator changes things?

00:20 <Yorlik> I feel i need a custom allocator ...

00:21 <heller_> no, if you don't have enough capacity, the implementation allocates new storage with enough capacity, copies over the old elements and then inserts the new element

00:21 <Yorlik> having a vector copy in the middle of an update cycle would be fun for sure

00:21 <Yorlik> :)

00:21 <heller_> a custom allocator won't help you there ;)

00:21 <Yorlik> I have an idea for a workaround, but it requires a never relocating allocator

00:21 <Yorlik> Its possible

00:22 <Yorlik> You reserver a HUGE amount of virtual address space begorehand

00:22 <Yorlik> and grow only

00:22 <Yorlik> and you malloc additional pages from that space as you go

00:22 <Yorlik> unfortunately its os specific how to do it

00:23 <Yorlik> virtual alloc on windows and I think mmap on linux

00:23 eschnett has quit [Quit: eschnett]

00:24 <Yorlik> Which means I should probably just write my own managed vector

00:28 <Yorlik> Or Vector-ish data structure - rather a growable array, no erase

00:35 <heller_> yeah

00:35 <heller_> and that can be done with std::allocator for functional testing first

00:36 <Yorlik> I could use an insanely high reserver ofc

00:36 <heller_> right

00:36 <Yorlik> though ... does reserve actually reserve or dows it allocate on demand?

00:36 <heller_> reserve actually reserves

00:37 <Yorlik> I assumed reserve does all the mallocs already

00:37 <heller_> yes

00:37 <Yorlik> means the space is wasted

00:37 <heller_> right

00:37 <Yorlik> I want a virtual reserve that does malloc on demand

00:37 <heller_> but replacing std::allocator with something hand rolled to lazily allocate the pages on demand, is trivial

00:38 <Yorlik> Allright

00:38 <heller_> that's how our stacks work, btw

00:38 <Yorlik> You allocate virtually ?

00:38 <heller_> yes

00:38 <Yorlik> and do the hard alloc lazily?

00:38 <Yorlik> Nice !

00:39 <heller_> be aware of the initial overhead when accessing one of those non allocated pages

00:40 <Yorlik> I am thinking of some sort of time sliced preventive allocation triggered by a watchdog

00:40 <heller_> the generated page faults have a significant impact

00:40 <Yorlik> like you watch the size and when its above a threshold you start allocating

00:41 <Yorlik> I'd like to prevent that from start

00:41 <heller_> the granularity will be the size of a page

00:41 <heller_> but you can control that (under linux at least)

00:41 <heller_> that is the page size

00:42 <Yorlik> That was my thought - a 4K alligned vector allocating in steps of 4k

00:46 <heller_> those are all operations that can be done once the game is functional ;)

00:46 <heller_> or the core engine

00:47 <zao> For my own HPX testing, I still have the semi-automated Singularity-based thing I'm hacking on.

00:47 <heller_> and for work?

00:48 <zao> Nothing currently, just EasyBuild installations and some manual test suites for software we have ideas about performance/correctness for.

00:48 <zao> We're looking at CSCS's Reframe eventually to automate some of those.

00:49 <zao> As for the StarPU-using group, I have no idea what their methodology is. Infinite numbers of PhDs, I guess :)

00:49 <zao> There's a presentation of some milestone tomorrow, hoping to catch that.

00:52 <heller_> reframe looks awesome

01:59 K-ballo has quit [Quit: K-ballo]

03:08 eschnett has joined #ste||ar

03:30 hkaiser has quit [Quit: bye]

04:53 eschnett has quit [Quit: eschnett]

05:42 <Yorlik> When continuously inserting random numbers into a std::set in batches, you'd expect *set.begin() to get smaller over time and *(--set.end()) to become bigger, right? But I am seeing occasionally the bigin is getting bigger. my set is a std::set<uint64_t> ( ) and I am printing values with %16X in printf. Though the chance I have made an error is higher, I can't get my head around this. I am doing no deletions at

05:42 <Yorlik> all. Output is here: https://imgur.com/a/eSQaEY4

05:43 <Yorlik> Code: https://gist.github.com/McKillroy/7958735935caefebd84a74a709599745

05:47 <Yorlik> I wonder if the uint64_t is too much for cout and it's just getting truncated

05:47 <Yorlik> the hex digits are never getting past 8

05:48 <Yorlik> resp printf %16X not working (not cout)

05:55 <Yorlik> Yup confirmed - %16X is not working.

05:55 <Yorlik> Had to split the printout into 2x32bit uints to work

05:56 <Yorlik> Thanks, IRC for being a nice rubberduck ;)

06:38 akheir has quit [Remote host closed the connection]

06:43 akheir has joined #ste||ar

07:51 <jbjnr__> heller yt?

07:53 <jbjnr__> I just looked at some the logs of jobs I ran last night, and I see exceptions (not many, but some) thrown because of "{what}: archive data bstream is too short: HPX(serialization_error)

07:53 <jbjnr__> " - (testing libfabric). I'm trying to remember what the problem we had that caused this (bad data I know, but there were certain specific issues that trigered it). If your memory is better and you recall it, please let me know. thanks

08:44 <Yorlik> Oh - someone reviewed :D - Thanks to you _heller :)

08:47 <Yorlik> Time to build the new HPX :)

09:23 <Yorlik> Argh - it doesn't compile

09:23 <Yorlik> Issue filed.

09:29 <heller_> jbjnr__: uh, don't remember, sorry

09:29 <jbjnr__> no worries. I'll dig into the code over the weekend. must be a race somewhere

09:48 <heller_> sure is

09:59 <jbjnr__> does master branch have occasional deadlocks? I see occasioanl hangs on exit on the moody camel branch and I'm not sure if it's new or present on master too.

10:02 <simbergm> jbjnr__: definitely

10:02 <jbjnr__> you mean present on master too?

10:03 <simbergm> jbjnr__: definitely deadlocks still on master

10:03 <jbjnr__> ta

10:03 <jbjnr__> who's fixing it?

10:03 <jbjnr__> :)

10:25 <heller_> Me

10:25 <heller_> I think I have most fixed on the sanitizers branch

10:28 <jbjnr__> \o/ yay for heller.

10:29 <jbjnr__> it might not be your fault - but you always fix it!!!

10:29 <jbjnr__> (It's always your fault btw)

11:07 K-ballo has joined #ste||ar

11:08 <heller_> jbjnr__: would be boring otherwise ;)

11:55 <simbergm> heller_: what's missing on the sanitizers branch btw? can we help?

12:02 <heller_> simbergm: there are some leaks still

12:02 <heller_> simbergm: i'll open a PR step by step

12:03 <heller_> and eventually we'll be able to have the sanitizers running along normally

12:03 <heller_> but the MPI version needs to be updated first...

12:06 <simbergm> awesome

12:30 <Yorlik> Anyone else having issues compiling the current master with tests? (Windows) It seems to compile now, but I have to compile it with -DHPX_WITH_TESTS=OFF

12:31 <Yorlik> Also had to disable a bit in a CMake File in the tests

12:33 <K-ballo> which tests are failing to compile?

12:34 <Yorlik> It begins with the CMakeFile - so it already breaks in the generation phase

12:34 <Yorlik> I filed an issue for that one

12:35 <Yorlik> But when disabling the offending lines (seems like a harmless bug) the tests break

12:35 <Yorlik> I'll have to start another compile to get the exact message

12:35 <Yorlik> Running my hackish fix compile in the moment

12:35 <Yorlik> Just wondering if there was a known issue

12:36 <K-ballo> odd, the file has not been touched for months

12:36 <K-ballo> Yorlik: when's the last time you had successful compilation with tests?

12:36 <Yorlik> I used tag 1.2.1

12:36 <Yorlik> I wanted that enhancement hkaiser made for custom allocators

12:37 <Yorlik> thats why I'm compiling

12:37 <Yorlik> 1.2.1 worked flawlessly

12:37 <Yorlik> No compiles in between

12:40 K-ballo has quit [Ping timeout: 250 seconds]

12:43 K-ballo has joined #ste||ar

12:44 <K-ballo> I don't usually compile with test and examples on windows, but I last did a couple weeks ago

12:45 <Yorlik> That probably was close to 1.2.1

12:45 <Yorlik> maybe even pre 1.2.1

12:46 <jbjnr__> tests are broken because of guided_pool_executor. I'm trying to fix it now, but still don't understand what's going on

12:46 <Yorlik> OK

12:46 <K-ballo> 1.2.1 wouldn't have most of the changes that happened since 1.2.0, couple weeks old master would

12:48 <K-ballo> generation with tests works here, might need more/less flags, or a clean build

12:49 <K-ballo> ye, needs vcpkg

12:50 <K-ballo> actually no.. should be unconditional

12:50 <Yorlik> I'm compiling without vcpkg

12:50 <K-ballo> are you sure that error is the very first error to pop up?

12:50 <Yorlik> Default options

12:51 <Yorlik> Yes

12:51 <Yorlik> But when I fix the Cmake File the tests explode

12:51 <Yorlik> Tests off generates and compiles properly now

12:51 <K-ballo> I'm not interested in that, I don't think the test exploding could explain the cmake configuration error

12:52 <Yorlik> I als believe the CMake is an extar thing

12:52 <Yorlik> The target in the offending lines does not exists

12:52 <Yorlik> I made a full text search for it

12:52 <K-ballo> see here https://github.com/STEllAR-GROUP/hpx/blob/master/tests/unit/threads/CMakeLists.txt#L65

12:52 <K-ballo> and here https://github.com/STEllAR-GROUP/hpx/blob/master/tests/unit/threads/CMakeLists.txt#L7

12:54 <Yorlik> Weird

12:54 <Yorlik> Might be some bug causing CMake to silently fail maybe?

12:55 * Yorlik doesn't like silent fails

12:55 <K-ballo> I don't know, but we have a real problem there, commenting the offending line out won't help

12:55 <Yorlik> It consumes the list, but doesn generate a target - I'll check my local files really quick

12:55 <K-ballo> are you still using your custom made special superbuild cmake flow?

12:55 <Yorlik> Just to exclude any weirdness on my side, but I tried after a hard reset too

12:56 <Yorlik> Yes - custom superbuild -but its pretty stable these days

12:56 <K-ballo> see if you can reproduce without it

12:56 <Yorlik> OK

13:01 <Yorlik> At least I can confirm my local file isn't corrupted - the löist definitely has the test here too.

13:01 <Yorlik> Working on the conventional compiule

13:01 <heller_> how did you upgrade to master?

13:01 <Yorlik> Clean Checkout and CMake Gui

13:02 <Yorlik> I used the git default gui - pull and merge

13:03 <Yorlik> err - fetch and merge ofc

13:17 hkaiser has joined #ste||ar

13:19 <Yorlik> K-ballo: It generated with the normal CMake gui - but a normal VS project.

13:19 <Yorlik> That was my normal CMake 13

13:19 <jbjnr__> K-ballo: I think I know what's wrong now. The async_execute used to be called with (funtion, dataflow_frame, result, predecessor(tuple of Futures)), but now it is called only with (function, predecessor(tuple of futures)), the dataflowframe and result are gone. Not sure why the result was there, but I guess it was filled in by the caller

13:20 <Yorlik> Inside VS I am using the integrated CMake of VS and targeting Ninja

13:20 <Yorlik> I copied / moved the source tree which worked in place of the old one and it broke again

13:21 <Yorlik> So - there is some difference in the generation process / environment between inside and outside of VS which makes the difference

13:21 <Yorlik> I personally think I will pull all dependencies out of my superbuild and make my life easier

13:22 <Yorlik> My motivation to track these tidbits of broken build processes is not very high tbh.

13:22 <Yorlik> I'll just keep to the given standards and done.

13:22 <Yorlik> No one can check all variations and depths of all circumstances - but still: there is a possibility this issue indicates some weirds problem.

13:23 <Yorlik> Its just a pity to lose the possibility to automate everything and haviong to resort to manual labour.

13:30 <heller_> FWIW, I can't reproduce your error on linux

13:31 <Yorlik> I think its some weirdness in the depths of the system.

13:31 <Yorlik> I just can say even after completely removing my .vs and build directory and restarting VS it reproduced

13:43 <K-ballo> jbjnr__: dataflow uses post, does that get synthezised from async_execute?

13:44 <jbjnr__> yes, eventually.

13:45 <jbjnr__> I seem to ahve fixed it, but I have a lockup on exit - every time

13:45 <jbjnr__> not sure if that's related or different

13:45 <jbjnr__> (just my test)

13:46 <K-ballo> what was result? from what I can see it would have been called with the frame and a boolean constant

13:47 <K-ballo> we need to document in the dataflow implementation and the post implementation the places in which changes would affect the guided executor

13:47 <jbjnr__> I was puzzled by result. I just passed it rhough, but no real ide what it was. I called it result, bu maybe it was something else cos it was a different type from result_type

13:47 <jbjnr__> it's gone now anyway

13:47 <jbjnr__> whatever you cleaned up, made it better

13:47 <K-ballo> ah, it must have been that true/false_type for is_void

13:47 eschnett has joined #ste||ar

13:48 <jbjnr__> could be

13:50 <K-ballo> lucky you I left the tuple of futures there as a convenience, my intention is to eventually take that away too

13:50 <jbjnr__> if you remove the tuple and we just have a list of futures, that's probably ok, cos I can specialize on >1 futures

13:51 <K-ballo> no, I would remove all the things

13:51 <K-ballo> feed the executor a nullary callable

13:51 <jbjnr__> anyway, compilation is fixed now.

13:51 <K-ballo> if we were in 14 I could have done it, with 11 support it required too much machinery

13:51 <jbjnr__> ^^that might be a problem - if I can't introspect the args, I can't do late binding

13:52 <K-ballo> that's the underlying something I'm hoping to understand following the fix

13:52 eschnett has quit [Client Quit]

13:52 <jbjnr__> if and when you go down that path, remember this conversation and warn me

13:53 <K-ballo> my hands are tied now that I know the guided executor depends on deep implementation details

13:53 <jbjnr__> we can still discuss it and look for a way ...

13:53 <jbjnr__> (but not today please :)

13:53 <K-ballo> I will look for a way, but worst case scenario we'll annotate datafllow with comments not to change things for other things depend on them

14:01 <jbjnr__> so you want dataflow to unwrap each future as it becomes ready, bind them to a new callable and keep unwrapping each layer until all futures are complete, then there's a final callable that is passed to the executor with all args bound in to it?

14:03 <K-ballo> mmh, I'm not sure I understand, but I think not..

14:03 <K-ballo> this is the implementation guided executor depends on: https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/lcos/dataflow.hpp#L193-L199

14:04 <K-ballo> ideally I would capture `futures` rather than pass it as an argument

14:04 <hkaiser> jbjnr__: I'm not aware of any interface changes for the executors

14:04 <K-ballo> that would require either 14's lambda init captures, or a custom made callable

14:05 <K-ballo> looking at this back now, I'm surprised I didn't write a simple custom made callable... I did so for a few other cases

14:06 <jbjnr__> I see

14:06 <jbjnr__> I'll ponder it offline.

14:06 <hkaiser> K-ballo: post is normally not synthesized, but if the executor does not expose it, the customization points will use async_execute instead - true

14:13 <jbjnr__> what do you mean by 'synthesized' here?

14:23 <Yorlik> hkaiser: I closed the issue with my HPX build, since it obviously is specific to my setup. I still can build using the default way of using external CMake and targeting MSVC.

14:26 eschnett has joined #ste||ar

14:32 <hkaiser> Yorlik: ok

14:33 <hkaiser> jbjnr__: the executor customization points do different things depending on what functionality the used executor exposes

14:33 <hkaiser> jbjnr__: e.g. if the executor exposes post(), then the post_execute CP will use it, otherwise it will call the executor's async_execute instead

14:34 <hkaiser> ... and discard the returned future

14:39 <jbjnr__> ok. Just the terminology was confusing me

14:41 <jbjnr__> lockup problem solved. Just me being useless and leaving some debug code in there.

14:41 <jbjnr__> will do PR now

14:43 hkaiser has quit [Quit: bye]

14:45 <K-ballo> great

14:53 aserio has joined #ste||ar

15:39 <heller_> simbergm: first set of patches: https://github.com/STEllAR-GROUP/hpx/pull/3737

15:53 <simbergm> heller_: thanks!

15:54 <simbergm> various references/ids/whatever were released too early before?

16:10 Yorlik has quit [Read error: Connection reset by peer]

16:11 Yorlik has joined #ste||ar

16:51 <heller_> simbergm: yeah

17:34 hkaiser has joined #ste||ar

17:51 nikunj has joined #ste||ar

18:03 akheir has quit [Quit: Konversation terminated!]

18:07 akheir has joined #ste||ar

18:32 aserio has quit [Ping timeout: 252 seconds]

18:57 aserio has joined #ste||ar

19:07 <heller_> hkaiser: upgraded the MPI version in our docker build container. The MPI migrate_component run looks pretty good now.

19:50 <hkaiser> heller_: good!

19:50 <hkaiser> and thanks!

20:02 aserio1 has joined #ste||ar

20:02 aserio has quit [Ping timeout: 252 seconds]

20:02 aserio1 is now known as aserio

20:05 khuck has joined #ste||ar

20:05 <khuck> hkaiser: did you or adrian send a webex link for the meeting today?

20:05 <khuck> aserio: ^^

20:05 <aserio> I did

20:05 <aserio> give me a sec

20:06 <aserio> https://lsucct.webex.com/lsucct/e.php?MTID=m6b5453932677db308c01c64b41772172

20:06 <aserio> hkaiser: ^^

20:06 <aserio> khuck: ^^

20:17 khuck has quit []

20:57 eschnett has quit [Read error: Connection reset by peer]

20:57 eschnett has joined #ste||ar

21:07 hkaiser has quit [Quit: bye]

21:20 aserio has quit [Ping timeout: 252 seconds]

21:21 bibek has quit [Quit: Konversation terminated!]

21:54 eschnett has quit [Quit: eschnett]

21:58 aserio has joined #ste||ar

22:48 aserio has quit [Quit: aserio]