hkaiser changed the topic of #ste||ar to: The topic is 'STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
<Yorlik> I am trying to get an hpx::future<void> from a parallel for and push it into a vector<hpx::future<void>> but it doesn't work. I get an error: 'hpx::lcos::future<void>::future(const hpx::lcos::future<void> &)': attempting to reference a deleted function. What'S wrong here?
<Yorlik> the code is like: futures.push_back( hpx::parallel::for_loop( ...
<Yorlik> And the error comes up at: for ( auto fut : updater.futures ) { ...
<Yorlik> It says hpx::lcos::future<void> has a user defined move constructor. - well - it wasn't me - was it?
<hkaiser> Yorlik: use for(auto&& fut : futures)
<Yorlik> ARGH ... rvalue refs ...
<Yorlik> It compiles - THANKS !
diehlpk has joined #ste||ar
hkaiser has quit [Quit: bye]
diehlpk has quit [Ping timeout: 240 seconds]
<simbergm> heller: yt? how did it go with your gh actions experiments?
<Yorlik> simbergm: yt?
<simbergm> yeah
<Yorlik> I see all these module sections in the docs now - totally like it. But the column on the left side is too narrow to properly dispay all.
<Yorlik> Might need some .css adjustment somewhere.
<simbergm> screenshot?
<simbergm> desktop or mobile?
<Yorlik> Desktop Browser
<simbergm> ah right, thanks
<simbergm> the compatibility headers shouldn't even be there...
<simbergm> but even then it's too narrow
<Yorlik> :)
<simbergm> yeah, it looks the same for me :/
<simbergm> Yorlik: thanks for letting me know, I'll see what I can do about it
heller1 has joined #ste||ar
<heller1> ms: very slowly ... I am too overly tired these days :/
<simbergm> heller: yeah, no worries
<simbergm> I was also looking at it again shortly last week
<simbergm> I might try to get at least a little windows builder up (that could build more than just core)
<simbergm> builder/config
<heller1> nod
<heller1> and maybe even OSX
<simbergm> yep
<Yorlik> How again would I access the underlying object from an id_type locally? Actions only? Or could I get an object directly (for testing)
<Yorlik> NVM - found it - forgot I had documented it myself already :) https://mckillroy.github.io/hpx_snippets/r_creating_and_referencing_components.html
<simbergm> Yorlik: sorry, got distracted and I wouldn't have known the answer anyway :) I'm usually a bad bet for questions about components and actions
<Yorlik> NP. Struggleing with output now, but I'm getting to it. I wonder how to make a synchronized cout like thing from a task or parallel loop.
<Yorlik> hpx::cout isn't really synced.
<Yorlik> I'm getting a read access violation when waiting for a future. Why could this happen? It is produced as output from a parallel loop.
<Yorlik> It goes back to future.hpp line 782: shared_state_->wait(ec);
<Yorlik> What could go wrong here: for ( auto && fut : updater.futures ) { fut.wait( ); } ??
<heller1> not cleaning updater.futures?
<heller1> try for (auto & fut : updater.futures) instead
<Yorlik> I doi a clear() after every run
<Yorlik> Its like this:
<Yorlik> uint8_t controller::update_frame( ) {
<Yorlik> updater.update_frame( );
<Yorlik> for ( auto && fut : updater.futures ) { fut.wait( ); }
<Yorlik> return 0;
<Yorlik> updater.futures.clear( );
<Yorlik> }
<Yorlik> updater.update_frame creates all the futures and puts them in the vector with push_back
<Yorlik> I'ma lso fighting some lockup in the oputput. Seems I have a bunch of io threads hanging around not working anymore
<Yorlik> Unrelatred issue.
<heller1> are you running in debug mode?
<Yorlik> Yes
<Yorlik> I think I need to make a console output object running in its own thread. This output blocking everything sucks.
<Yorlik> 43 threads ucrtbase.dll ...
hkaiser has joined #ste||ar
<hkaiser> simbergm: yt?
<simbergm> hkaiser: here
<hkaiser> simbergm: g'morning
<hkaiser> for the 1.4.1 rc: would you mind adding the build system/external testing change before doing the release?
<hkaiser> #4335
<simbergm> good morning
<simbergm> not at all
<simbergm> I just started looking at cherry picking commits for the release
<simbergm> my only comment on that was that the compile-only tests don't seem to be set up correctly but we can also leave that part for the next release
<hkaiser> simbergm: I think #4319 has the list of things that we might want to include - but as always - feel free to decide otherwise
<simbergm> because other than that you've done a very thorough job ;)
<simbergm> yep, I'm going by that
<hkaiser> simbergm: yes, I think I overlooked compile time tests
<simbergm> I removed one which already went in 1.4.0 but otherwise most of them seem like good candidates for the patch release
<hkaiser> which one did you remove?
<hkaiser> the MPI one?
<simbergm> was #4304 (the pack rewrite PR) critical to have in 1.4.1? it might be that one which is failing on the oldest gcc builders (I haven't confirmed it though)
<simbergm> yeah, nikunj's mpi one
<hkaiser> simbergm: don't think so, even more as it breaks gcc-oldest, I believe
<hkaiser> I meant #4304
<hkaiser> ahh no, the tuple refactoring breaks gcc-oldest
<simbergm> ah, 4354?
<simbergm> in any case, I don't think 4304 is critical for 1.4.1, so I'd rather leave it out if it's not necessary
<hkaiser> yah
<hkaiser> I don't need #4304 right now
<simbergm> 👍️
<simbergm> #4373 is not critical either, right? it just adds a new test if I read it correctly
<hkaiser> right, not critical as it doesn't fix anything
<hkaiser> and also the CMakeLists.txt still needs tweaking
<simbergm> all right, good
<simbergm> but then let's go with #4335 as it is and I'll open another issue as a reminder to look at the compile tests later
<simbergm> sound good?
<hkaiser> simbergm: I would like to have #4372, perhaps - not sure whether we have the warnings fixed on 1.4.1 otherwise - they were reintroduced recently
<hkaiser> simbergm: yes to #4335
<simbergm> hkaiser: yes, 4372 is also a good idea
<simbergm> looks like the apex build finished now without warnings so we can merge that
<hkaiser> simbergm: I thin ksome of your recent tweaks re-introduced warnings, if those tweaks of yours are not part of 1.4.1, then we don't need #4372
<simbergm> in principle the warnings weren't there in 1.4.0 because I introduced new ones (I suppose) with the clang-tidy PR
<simbergm> hmm, let's see if it applies cleanly actually
<simbergm> exactly
<Yorlik> hkaiser: Fort some reason my output is locking up and I can't figure out why. E.g. in the following code it prints "1" and then stops until I trigger output from another thread.:
<Yorlik> hpx::cout << 1 << hpx::flush;
<Yorlik> frame_starttime = std::chrono::high_resolution_clock::now( );
<Yorlik> hpx::cout << 2 << hpx::flush;
<Yorlik> "2" does not immediately print.
<hkaiser> Yorlik: use std::flush - hpx::flush is obsolete
<hkaiser> anyways - that's orthogonal
<Yorlik> OK. I'll try.
<Yorlik> I think it's not the reason tbh.
<hkaiser> no idea why it locks up - can you provide a small example?
<hkaiser> no it's not the reason
<simbergm> we should deprecate it officially in that case...
<Yorlik> There is no small example. It's the beast of my app locking up in wird ways and somehow it's console related
<Yorlik> When I connect my external administrative client and it triggers some output I see more
<hkaiser> simbergm: we can deprecate all hpx manipulators, I think - the equivalend std:: manipulators should do the trick - requires some investigation, however
<hkaiser> Yorlik: could be unrelated things
<Yorlik> It's weird.
<hkaiser> Yorlik: I ran into a very strange and complicated deadlock yesterday, pls see #4369 for details
<hkaiser> nobody can be even blamed for that one
<Yorlik> OK
<hkaiser> just making sure you don't run into similar problems
<simbergm> hkaiser: sorry that we didn't keep you up to date on the rp/init changes, didn't mean to have us all duplicate work
<simbergm> I think your changes mostly solve a different problem though
<hkaiser> nah, no worries
<hkaiser> I'm trying to solve a separate issue, mostly
<simbergm> so are you saying that the examples that create a resource partitioner up front most likely never worked on windows?
<hkaiser> can my changes be integrated with what you have in mind?
<simbergm> yeah, fairly easily I think
<simbergm> I'd just like to understand to problem a bit better so we don't introduce back the same problem again
<hkaiser> simbergm: they do work, the issue shows up only if a) the resource-partitioner is created explicitly, b) it is passed a explicit hpx-main function, and c) all of that happens in a shared library, not the application
<simbergm> where do the linker errors come from?
<hkaiser> I have tried to explain on the ticket
<simbergm> yeah, I saw that, I'm just being slow :P
<hkaiser> init() without arguments still refers to hpx_main as a symbol which in the end causes the linker to complain that hpx_main is not defined
<hkaiser> simbergm: I'll try to concoct a test for this
<simbergm> b) `init` is passed an explicit main function?
<simbergm> does rp have an overload taking the main function...?
<hkaiser> no-arg init() calls another init() that eventually passes hpx_main' as a symbol
<simbergm> ah, because init would like to have hpx_main but it doesn't actually need it if it's been passed an explicit main function (that might be called something other than hpx_main)
<hkaiser> yes
<hkaiser> simbergm: on linux this doesn't show up as shared libraries can be linked with unresolved externals, on windows this does not work
<hkaiser> there, shared libraries can't have unresolved externals at link time
<simbergm> I think I got it now
<simbergm> thanks for explaining
<simbergm> yeah, a small test would be good
<simbergm> and I'll try to get that windows builder working on gh actions...
<simbergm> I *think* rori's changes might make it still a bit more straightforward (maybe not for the linking... let's see)
<simbergm> at least there wouldn't be a split between initializing rp and using init, there would just be init
<simbergm> Yorlik: https://github.com/STEllAR-GROUP/hpx/pull/4315 did fix your problems in the end, no? the comments were a bit inconclusive...
<Yorlik> hkaiser: It seems to be my event loop for controlling the server.
<Yorlik> simbergm: I'd have to check again. I kept building without examples.
<simbergm> Yorlik: ok, if you have time please do check but otherwise we'll go with what was there
<Yorlik> OK - I'll give it a shot later.
<simbergm> thanks!
hkaiser has quit [Quit: bye]
mdiers_ has quit [Remote host closed the connection]
mdiers_ has joined #ste||ar
<simbergm> diehlpk: yt? would you mind running 1.4.1-rc1 through the fedora build servers? hkaiser, would you mind doing the same on windows? at least building the tests
<simbergm> ah, he's not here...
<simbergm> heller, hkaiser: do you know if the scripts in `python/{hpx,scripts}` are still useful for anything?
<simbergm> I was going fix them to make sure they're installed correctly and have the right shebang, but it looks like they're pretty old and unused and then I'd rather remove them...
<Yorlik> msimberg: How would I merge that PR again?
<Yorlik> I tried git fetch origin pull/4315/head:master
<Yorlik> But it refuses to fetch
<simbergm> Yorlik: you can try the 1.4.1-rc1 tag as well, I've cherry-picked it over there
<Yorlik> OK
<simbergm> it might be the branch was already deleted
<Yorlik> Still struggleing with PRs
<simbergm> master would essentially have it as well
<simbergm> usually for open prs I just check out the actual branch name, not the gh-generated pr name/branch
<Yorlik> In my list there is no 1.4.1-rc1
<Yorlik> FFS - it's on github
<Yorlik> Moment ...
<Yorlik> Ok found it - weird sorting here
<Yorlik> building .... lets see ...
<Yorlik> Creating library Debug\lib\async_customization.lib and object Debug\lib\async_customization.exp
<Yorlik> C:\__A\Arc_Sb\_MSVC\BUILD\Windows-X64-Debug\async_customization.cpp.obj : error LNK2019: unresolved external symbol "__declspec(dllimport) public: int __cdecl hpx::threads::executors::pool_numa_hint<struct dummy_tag>::operator()(int,double,char const *)const " (__imp_??R?$pool_numa_hint@Udummy_tag@@@executors@threads@hpx@@QEBAHHNPEBD@Z) referenced in function "public: class hpx::lcos::future<char const *> __cdecl
<Yorlik> hpx::threads::executors::pre_execution_async_domain_schedule<struct hpx::threads::executors::pool_executor,struct hpx::threads::executors::pool_numa_hint<struct dummy_tag> >::operator()<class <lambda_c3d4af7bace3c898aab376bc7ac724d9>,int,double,char const *>(class <lambda_c3d4af7bace3c898aab376bc7ac724d9> &&,int &&,double &&,char const * &&)const " (??$?RV<lambda_c3d4af7bace3c898aab376bc7ac724d9>@@HNPEBD@?$pre_execution_
<Yorlik> async_domain_schedule@Upool_executor@executors@threads@hpx@@U?$pool_numa_hint@Udummy_tag@@@234@@executors@threads@hpx@@QEBA?AV?$future@PEBD@lcos@3@$$QEAV<lambda_c3d4af7bace3c898aab376bc7ac724d9>@@$$QEAH$$QEAN$$QEAPEBD@Z)
<Yorlik> Trying -DHPX_WITH_EXAMPLES=OFF now ...
<Yorlik> 1.4.1-rc1
<Yorlik> And builds
<Yorlik> msimberg: Clearly the examples build still is broken for me on windows.
<Yorlik> Time for a nap - BBL
<simbergm> Yorlik: excellent, thanks
hkaiser has joined #ste||ar
<hkaiser> simbergm: thanks for doing the rc, I'll test it later tonight... travelling all day
<simbergm> hkaiser: oh right, sorry
<simbergm> we can delay it a bit as well if you don't find the time for it
<hkaiser> no reason to be sorry - your work is highly appreciated
<hkaiser> nah, should be able to do it tonight
<simbergm> diehlpk_mobile: yes please!
<simbergm> diehlpk_mobile: it's tagged: https://github.com/STEllAR-GROUP/hpx/tree/1.4.1-rc1
<hkaiser> simbergm: Yorlik's error looks like a missing HPX_EXPORT
<hkaiser> simbergm: also, I don't see diehlpk_mobile's messages here on irc
<hkaiser> time to switch I guess
<zao> *sadcat*
<hkaiser> indeed
<hkaiser> irc is so nicely no-nonesense
<simbergm> :(
<simbergm> this wasn't meant to force people off irc...
<hkaiser> *sure* ;-)
<simbergm> sigh... I'm not an evil person!
<hkaiser> sure, no worries
<simbergm> at least I have good intentions ;)
<simbergm> there are plenty of HPX_EXPORTS for pool_numa_hint, but yeah, clearly something is missing...
<hkaiser> simbergm: it's the operator()()
<hkaiser> I think
<hkaiser> simbergm: ahh
<hkaiser> the example defines the hint itself, no need for an export there
<hkaiser> gtg, flight's boarding
hkaiser has quit [Quit: bye]
<simbergm> Yorlik: an HPX_EXPORT after struct here: https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/runtime/threads/executors/guided_pool_executor.hpp#L96 might work, but don't sweat it if you don't feel like trying it out
<simbergm> hkaiser will get to it eventually :)
hkaiser has joined #ste||ar
<hkaiser> simbergm: Yorlik's problem is not caused by a missing HPX_EXPORT, but by a superfluous one: https://github.com/STEllAR-GROUP/hpx/blob/master/libs/resource_partitioner/examples/async_customization.cpp#L471
hkaiser has quit [Quit: bye]
<simbergm> hkaiser: right, makes sense
<heller1> @ms:matrix.org: remove them
<simbergm> heller: yay! (I suppose you mean the python scripts?)
<heller1> Yes
diehlpk_work has quit [Ping timeout: 260 seconds]
<Yorlik> o/
<Yorlik> Nothin like a long nap :)
<Yorlik> Seems you're figuring it out with that build issue. I'm not going to look into it again - gotta fix soime stupid stuff here.
hkaiser has joined #ste||ar
<hkaiser> heller1: yt?
<Yorlik> hkaiser: It seems my server is freezing because of some trivial threading error: I run the administrative command loop from a function in my hpx main, the rest is forked off with "auto game_result_future = hpx::async<agns::sim::controller::start_action>( hpx::launch::fork, gcId );" But the main hpx thread seems to disturb it somehow. When I switch off the command loop and the hpx main just keeps sitting at return
<Yorlik> hpx::finalize( ); everything seems fine. Any idea what I should change?
<hkaiser> difficult to tell
<hkaiser> btw, why launch::fork?
<hkaiser> does it make a difference to you?
<Yorlik> I wanted the main similation and the control loop be completely separated - but maybe that's just stupid.
<hkaiser> well, this is not wat launch::fork is doing
<Yorlik> The control loop has a hpx::this_thread::sleep_for( 10ms ); every run.
<Yorlik> Pls enlighten me :)
<hkaiser> nod, that has nothing to do with launch::fork either
<Yorlik> At least the server runs nicely when the copmmand loop is off.
<hkaiser> launch::fork is essentially the same as launch::async with the execption that the calling thread gets suspended and the new thread is run right away
<hkaiser> how many cores do you run your server on?
<Yorlik> It all runs at default settings - so 4*2
<hkaiser> k
<hkaiser> pls keep in mind that HPX threads don't interact well with any kind of kernel-based locking
<hkaiser> (see the ticket I referred you to earlier today)
<Yorlik> That's why I'm using hpx sleep.
<hkaiser> sure
<hkaiser> but how does your command loop receive commands?
<Yorlik> BTW it also locks up with ::async
<hkaiser> what kind of IO do you use there?
diehlpk_work has joined #ste||ar
<hkaiser> Yorlik: sure, I didn't say launch::fork is the issue
<Yorlik> hpx::cout and in the command loop some net IO
<hkaiser> so the net IO is calling into the kernel?
<Yorlik> nng for the networking
<Yorlik> Probably it does.
<hkaiser> so you have kernel locking going on there
<Yorlik> I wonder why it wans't an issue for months tbh
<hkaiser> I'd suggest running your command loop on the main thread (the one that called C-main)
<hkaiser> you just got lucky, I think
<Yorlik> Funny
<hkaiser> do you call back into HPX land from nng?
<Yorlik> But yes - probably that's my best bet - since I have a main already it shouldn't be witchcraft
<Yorlik> I am using the command loop to control the gamecontroller with some simple member ints
<hkaiser> you would need to use hpx::start/stop instead of hpx::init, though
<Yorlik> Problem is I'm using a bunch of hpx functions in my dispatcher.
<hkaiser> hmmm
<Yorlik> But I could refactor that away
<hkaiser> you can use run_on_hpx_thread(F) from the dispatcher
<Yorlik> And use a simple queue for data
<Yorlik> OK
<Yorlik> Or I use hpx networking for the simple UDP listener?
<Yorlik> Woops - TCP actually.
<hkaiser> don't think so - just marshal either networking onto a non-HPX thread or all HPX work onto an HPX one
<Yorlik> So I'd have to change this in my dispatcher: hpx::apply<agns::sim::controller::start_action>( GLOBAL.getGameControllerId( ) );
<hkaiser> yah, use run_as_hpx_thread([]() { apply<>(...); });
<Yorlik> Thanks. I might refactor later - already have start and stop
<Yorlik> Trying that lambda now
<hkaiser> Yorlik: do that only if you're on a non-hpx thread
<Yorlik> the loop runs in hpx::main
<hkaiser> run it in C-main after hpx::start
<Yorlik> I don't use hpx::main for anything but forking away the server core
<hkaiser> anyways, it's a HPX-thread
<Yorlik> and that loop
<Yorlik> KK
diehlpk has joined #ste||ar
<diehlpk> Fedora 32
<hkaiser> nice!
<diehlpk> Fedora rawhide
<hkaiser> even nicer!
<hkaiser> thanks!
<diehlpk> hkaiser, scaling with mpi on Cori looks good
<hkaiser> ohh cool - at least something
<diehlpk> Sagiv is preparing the plots and will add them to the paper
<diehlpk> ! node to 32 nodes has a scaling factor of 4, not too bad
<hkaiser> hmm, strong scaling?
<diehlpk> Lowest level on all nodes
<diehlpk> We will start by Friday the week long run to collect the performance counters
<hkaiser> right, we should do the same as for the sc paper, compare partitions/s
<diehlpk> Since we have not enough work, the scaling will not look good at all
<hkaiser> so let's do more work, then - what's the point in scaling out if there isn't anough parallelism available
<diehlpk> We should not call it scaling, we should call it plot do determine the efficient amount of nodes for each level
<diehlpk> We can not do more work, since the largest level takes one month on 512 nodes
<hkaiser> sure, but eventually we need to show scaling or parallel efficiency
<diehlpk> We did it in the previous paper
<hkaiser> diehlpk: what a couple of timesteps take one month?
<diehlpk> No, the full merger
<hkaiser> nah, we don't need to full merger for the scaling results
<Yorlik> What's the header pls? 'run_as_hpx_thread': is not a member of 'hpx'
<hkaiser> hpx::threads::run_as_hpx_thread
<diehlpk> Plan was to run for one week, two weeks, and one month for the long-term runs and just look at the counters and identiy bottle necks
<Yorlik> TX!
<diehlpk> hkaiser, Therfore, we do not call it scaling results
<diehlpk> We just show this plot to justify the amoutn of nodes we used for the long-term runs
<hkaiser> diehlpk: ok
<hkaiser> ok
<diehlpk> I think it will be akward to show a scaling lot with larger levels and do not use them
<hkaiser> that's two orthogonal things
<diehlpk> We just can say that we have showed scaling in the previous cori paper and the piz daint paper
<hkaiser> but yah, let's use short runs to figure the right amount of nodes
<hkaiser> we still could plot the results of the short runs
<diehlpk> Yes, we will do this
weilewei has joined #ste||ar
<Yorlik> hkaiser: Server now runs like a sewing machine. Just had to switch out the hpx sleep in the loop against a std sleep ... :)
<weilewei> when I am compiling hpx 1.4.0 on Summit, I got this error: https://gist.github.com/weilewei/cafc1c675f5c15942a0a0f11efefdc7d
<Yorlik> All commands are wrapped in lambdas
<Yorlik> But I had to move everything to main
<Yorlik> Very simple
<weilewei> same error found in 1.4.1 pre-release
<hkaiser> weilewei: but you just did run the tests, no?
<hkaiser> I think you're compiling against the installed version of hpx
<hkaiser> some header mismatch
<hkaiser> Yorlik: nod good
<weilewei> hkaiser yes, I did run the tests before
<hkaiser> so why does it break now?
<weilewei> let me check again
<weilewei> btw, I run my dca w/ hpx, the thread idle rate is around 47% percent. For example in one locality: /threads{locality#0/total}/idle-rate,1,202.902002,[s],4695,[0.01%]
<hkaiser> weilewei: that's not too bad, but not too good either
<weilewei> hkaiser so does it mean that half of the time, hpx is doing nothing?
hkaiser has quit [Ping timeout: 268 seconds]
diehlpk has quit [Ping timeout: 240 seconds]
hkaiser has joined #ste||ar
<diehlpk_work> simbergm, hkaiser First build of 1.4.1 on f32 finished, so I think that 1.4.1 should be ready for fedora
<hkaiser> thanks diehlpk_work!
<hkaiser> there is at least one more PR that will go into 1.4.1
<diehlpk_work> Ok, I can test it again
<diehlpk_work> Tomorrow morning the arm build should be done as well
hkaiser has quit [Quit: bye]
<diehlpk_work> simbergm, hpx 1.4.1 finished on fedora rawhide as well
<diehlpk_work> So we are good for the patch release
weilewei has quit [Remote host closed the connection]
Yorlik has quit [Ping timeout: 260 seconds]
weilewei has joined #ste||ar
Yorlik has joined #ste||ar