hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoD: https://developers.google.com/season-of-docs/
diehlpk has joined #ste||ar
quaz0r has joined #ste||ar
<diehlpk> hkaiser, I try to compile the latest version of HPX on Daint just in case we need to run any benchmark
<hkaiser> ok
<hkaiser> problems?
<diehlpk> However, I get a very strange error message
<diehlpk> Yes, without problems I would not ask you
<hkaiser> ok, what's the problem?
<diehlpk> and see pm
<hkaiser> diehlpk: well, cmake doesn't find hpx
<hkaiser> do you have those files somewhere?
<hkaiser> if yes, use -DHPX_DIR=<the_dir_where_those_files_are_located>
<diehlpk> No, I saw this error while I compiled HPX
<hkaiser> huh?
<diehlpk> And therefore was not understanding what is going wrong
<diehlpk> But now I got it
<hkaiser> what was it?
<diehlpk> To embarrassing to say, Ava captured the laptop and undid some changes and I was doing still git clone octotiger and not hpx
<diehlpk> So I checked out octotiger but run the cmake to compile HPX
jaafar_ has joined #ste||ar
jaafar has quit [Ping timeout: 276 seconds]
<diehlpk> hkaiser, HPX compiled, octo is next
<diehlpk> After that I will run the test problem
<hkaiser> diehlpk: bad Ava!
<diehlpk> At least we can run the closest problem to pvfmm
<diehlpk> So gregor can run the pvfmm code tomorrow and we can send the committee some meaningless plot
<hkaiser> ok
<hkaiser> good luck
<diehlpk> No luck, octotiger does not compile
<diehlpk> hkaiser, We have compilation errors in the cuda scheduler
<hkaiser> yah sure, this is invalid code
<hkaiser> diehlpk: write it as (void const*)(&cuda_multipole_interactions_kernel_rho) for now
<diehlpk> hkaiser, Thanks, octo compiled
<diehlpk> Now I am running the benchmarks
hkaiser has quit [Quit: bye]
<Yorlik> Does hpx::this_thread::sleep_for(...) also yield?
diehlpk has quit [Ping timeout: 252 seconds]
<heller> Yorlik: sure
<Yorlik> So a sleep can make a task lose control ...
<Yorlik> Thats a tricky side effect
<Yorlik> Does the sleep need task time to go away or is it strictly using the real time clock? Like hpx::this_thread::sleep_for(10s) - would it require the task to burn 10 seconds while scheduled?
<heller> No
<heller> It gets suspended, and woken up after 10 seconds
<heller> So yes, it's using a clock in the background
<Yorlik> Good :)
Yorlik has quit [Read error: No route to host]
Yorlik has joined #ste||ar
nikunj has quit [Remote host closed the connection]
rori has joined #ste||ar
hkaiser has joined #ste||ar
daissgr_work has quit [Read error: Connection reset by peer]
daissgr_work has joined #ste||ar
hkaiser has quit [Quit: bye]
aserio has joined #ste||ar
hkaiser has joined #ste||ar
daissgr has joined #ste||ar
eschnett has joined #ste||ar
quaz0r has quit [Quit: bbl]
aserio has quit [Ping timeout: 250 seconds]
aserio has joined #ste||ar
hkaiser has quit [Quit: bye]
eschnett has quit [Quit: eschnett]
aserio has quit [Ping timeout: 272 seconds]
rori has quit [Quit: byr]
aserio has joined #ste||ar
hkaiser has joined #ste||ar
<diehlpk_work> simbergm, hkaiser parsa GSoD is submitted and the student should start with the community bonding phase next month
<simbergm> diehlpk_work: thanks! let's hope we get what we asked for
<parsa> ?
<diehlpk_work> I think we should meet with her in person and help the student to setup irc
<diehlpk_work> and parsa should meet with the student once a week at the beginning
eschnett has joined #ste||ar
<Yorlik> Is there a way to have a "sleep at least", like a timed yield with no guaranteed/enforced rescheduling ?
<hkaiser> Yorlik: what should that do?
<Yorlik> wait and then give the scheduler leeway to decide when to schedule
<hkaiser> sleep_for just guarantees that the task doesn't come back earlier than specified already
<Yorlik> Or is that happening anyways?
<Yorlik> OK - then it's already implemented like that
<Yorlik> I realize one particular difficulty this app lockup I had (seems fixed now, still testing):
<Yorlik> Since the output is totally async I never know where I am in the app
<Yorlik> Is there a built-in way to have sync output?
<hkaiser> use std::flush
<hkaiser> that should do the trick
<Yorlik> Does that also work when using printf?
<Yorlik> like fater
<Yorlik> after
<hkaiser> no
<Yorlik> I prefer printf for the built in formatting
<hkaiser> there you use fflush
<Yorlik> ok - seems I'll have top use fmt
<Yorlik> Ah ..
<Yorlik> like fflush(); ?
<hkaiser> you shouldn't rely on kernel io
<hkaiser> like std::cout or printf
<Yorlik> What would you recommend top use?
<Yorlik> Write my own logger task?
<hkaiser> hpx::cout and hpx::util::format or fmt{}
<Yorlik> oh - didn't know hpx has fmt
<Yorlik> I'll look that up
<hkaiser> or move the output code into a lambda that you run through hpx::threads::run_as_os_thread(f)
<hkaiser> the important part is that you don't suspend any hpx threads/tasks with a kernel mutex
<Yorlik> BTW: Indeed it was a race
<Yorlik> Thomas had the glorious idea to test with thread=1 and it all worked
<Yorlik> Seems I have to learn how to trace races :)
<Yorlik> the async output was a major hurdle debugging this
<hkaiser> it didn't even work with hpx:threads=1 for me
<Yorlik> I probably already had another version
<hkaiser> Yorlik: but yah, it's easy to always suspect HPX is broken... you would never claimed that if your version hanged using std::thread
<Yorlik> he race was, that under certain conditions a worker couzld get a limit of -1 (FFFFFFFF...) and create an insanely huge batch and never return from it
<Yorlik> Yes
<Yorlik> std::thread worked perfectly
aserio has quit [Ping timeout: 250 seconds]
<Yorlik> Now I need to find a good way to manage yields and not add a ton of overhead, like it is right now
<Yorlik> Controlling granularity efficiently is a thing I have to learn now.
<hkaiser> Yorlik: I'd suggest creating smaller tasks instead of yielding
<Yorlik> They are already pretty short
<hkaiser> how short?
<Yorlik> With OS threads sub microseconds
<Yorlik> Only in this specific test case ofc
<hkaiser> that's way too short for std::thread, even for hpx::threads
<Yorlik> thats from the non hpx version
<hkaiser> so your task is running for 4.2 seconds?
<Yorlik> the program
<Yorlik> The entire task yes
<hkaiser> that's what I meantt, create smaller tasks
<hkaiser> ~200us is a good measure
<Yorlik> Buit there could be MUCH more tasks (consumers)
<Yorlik> Thats where HPX will shine
<hkaiser> you don't gain anything from hpx if your task runs for 4 s each
<Yorlik> My plan for the nmext test is to launch like 50 consumers and do it both ways - OS and HPX
<Yorlik> I expect the OS swap penalty to be brutal
<Yorlik> e.g. look at this:
<Yorlik> Swarm [48] has returned ...
<Yorlik> Swarm [49] has returned ...
<Yorlik> Data sz: 1024
<Yorlik> RBuf sz: 1073741888, RBuf slots: 1048576 ( 1073741824 Bytes Buffer )
<Yorlik> Producer produced 1048576 items in 129.22 seconds = 8114.41 OPs/sec. = 8.31 MB/sec, Batches: 1485381 = 0.71 runs/batch average
<Yorlik> Results:
<Yorlik> Checker checked 1048576 items in 129.22 seconds = 8114.41 OPs/sec., error_count = 0, Batches: 1342626 = 0.78 runs/batch average
<Yorlik> Changer changed 1048576 items in 129.22 seconds = 8114.41 OPs/sec., MB changed = 1073.74, Batches: 509 = 2060.07 runs/batch average
<Yorlik> Checker2 checked 1048576 items in 129.22 seconds = 8114.41 OPs/sec., error_count = 0, Batches: 10 = 104857.60 runs/batch average
<Yorlik> Average Latency: 123.2375698.4 microseconds
<Yorlik> Measured runtime = 129.22395710.6 seconds
<Yorlik> Done !
<Yorlik> Thats a run with 54 consumers
<Yorlik> Still pretty low througfhput
<Yorlik> But quite a bunch of threads
<hkaiser> Yorlik: let me repeat what I said
<Yorlik> my 4 default consumers + 50 swarm
<hkaiser> you don't gain anything from hpx if your task runs for 4 s each
<Yorlik> A single step is a run around the buffer - thats super short
<hkaiser> ok
<Yorlik> the setup is like - I give you a set time and see how far the system goes
<Yorlik> then they start running in circles
aserio has joined #ste||ar
<Yorlik> I expect once we have message handlers using Lua for consuming the single items will be much slower
<Yorlik> means tha batches too
<hkaiser> whatever, you apparently don't want to listen to what I said
<Yorlik> I cannot let 50 threads run for 4 seconds without yield
<hkaiser> Yorlik: I didn't say you should
<Yorlik> So the question is when to yield?
<hkaiser> I said you should _not_ have tasks that run for more than a couple 100us
<Yorlik> But they run inside the HPX workers
<Yorlik> Yes
<hkaiser> Yorlik: whenever you want to yield stop this thread and create a new one
<Yorlik> In the moment the yields happen way to often - after each batch of items
<Yorlik> What would that be good for?
<Yorlik> Once a consumer is biting the tail of its predecessor I yield
<Yorlik> What I think I should do is to check the current batch size and not yield if it was too short
<hkaiser> Yorlik: instead of spin-looping on an atomic
<Yorlik> err - the other way around
<Yorlik> There is no cas loop in my code btw.
daissgr has quit [Ping timeout: 272 seconds]
<Yorlik> just acquire - release
<Yorlik> there are few loops for safety checks, but not in the hot path
aserio has quit [Quit: aserio]
aserio has joined #ste||ar
aserio has quit [Ping timeout: 252 seconds]
aserio1 has joined #ste||ar
aserio1 is now known as aserio
aserio has quit [Ping timeout: 245 seconds]
aserio has joined #ste||ar
hkaiser has quit [Quit: bye]
aserio has quit [Ping timeout: 268 seconds]
aserio has joined #ste||ar
eschnett has quit [Quit: eschnett]
K-ballo1 has joined #ste||ar
K-ballo has quit [Ping timeout: 268 seconds]
K-ballo1 is now known as K-ballo
aserio has quit [Quit: aserio]
nikunj has joined #ste||ar
jaafar_ is now known as jaafar