aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
EverYoung has quit [Read error: Connection reset by peer]
EverYoung has joined #ste||ar
eschnett has joined #ste||ar
EverYoung has quit [Ping timeout: 240 seconds]
EverYoung has joined #ste||ar
<zao>
Over 582 runs of the tcp.partitioned tests, none have failed on my Linux machine for commit a281166
EverYoung has quit [Ping timeout: 240 seconds]
mcopik has quit [Ping timeout: 256 seconds]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
<jaafar>
Sorry hkaiser
<hkaiser>
sorry jaafar for constantly pinging you ;)
<jaafar>
Hope it's OK I'm lurking
EverYoung has quit [Ping timeout: 245 seconds]
jakub_golinowski has quit [Remote host closed the connection]
<hkaiser>
jaafar: sure it is! good to have you around
hkaiser has quit [Quit: bye]
K-ballo has quit [Quit: K-ballo]
Anushi1998 has joined #ste||ar
V|r has quit [Ping timeout: 240 seconds]
Anushi1998 has quit [Remote host closed the connection]
Anushi1998 has joined #ste||ar
Vir has joined #ste||ar
Vir has quit [Ping timeout: 265 seconds]
Vir has joined #ste||ar
anushi_ has joined #ste||ar
Anushi1998 has quit [Ping timeout: 240 seconds]
Vir has quit [Ping timeout: 240 seconds]
Vir has joined #ste||ar
Vir has quit [Changing host]
Vir has joined #ste||ar
Vir has quit [Ping timeout: 240 seconds]
Vir has joined #ste||ar
Vir has quit [Ping timeout: 265 seconds]
Guest13135 has joined #ste||ar
Guest13135 has quit [Ping timeout: 240 seconds]
V|r has joined #ste||ar
V|r has quit [Ping timeout: 240 seconds]
anushi_ has quit [Quit: Leaving]
V|r has joined #ste||ar
V|r has quit [Changing host]
V|r has joined #ste||ar
V|r has quit [Ping timeout: 265 seconds]
nanashi55 has quit [Ping timeout: 264 seconds]
nanashi55 has joined #ste||ar
<jbjnr>
thanks zao that shows me something of mine is greatly broken :(
<github>
hpx/master 86e53e6 Mikael Simberg: Merge pull request #3237 from STEllAR-GROUP/pipeline_example...
hkaiser has joined #ste||ar
nikunj_ has joined #ste||ar
<simbergm>
jbjnr: fyi, I started pycicle for all PRs now, it's too useful to not have running
<simbergm>
I'll kill it once yours is up again
david_pfander has joined #ste||ar
<simbergm>
zao: don't think that's necessary, I'm 100% sure I was imagining things now, thanks though!
<hkaiser>
simbergm: March 24th marks 10 years since the first commit to hpx - wouldn't that be a good date for the release?
<simbergm>
hkaiser: yes! I'd be happy with that :)
<hkaiser>
;)
<simbergm>
I wanted to ask if you'd be happy to do it after the hwloc 2 issue is done
<hkaiser>
sure - you're the master of ceremnoies - go ahead
<simbergm>
the rest we put in as much as there is time for, but everything is just extra for me now
<simbergm>
everything else
<hkaiser>
ok
<simbergm>
ok, nice
<simbergm>
I guess you know nothing more about anton's progress on hwloc 2 than what is on the issue?
<hkaiser>
heller_ : might have another pr or two, but those should be small
<simbergm>
yeah, that's fine
<hkaiser>
simbergm: jbjnr probably wants for the scheduler changes to be in...
<simbergm>
I think he does but I'd rather have those in the next one
<simbergm>
unless he gets it together very quickly
<hkaiser>
:-P
<hkaiser>
ok
<simbergm>
there will always be a next release!
<hkaiser>
indeed
<hkaiser>
it has been a while since the last one, though
<hkaiser>
almost a year
<simbergm>
and jbjnr is capable of running off his own branches ;) for others I'd prefer to have a bit of testing time
<simbergm>
yeah, I know and I want to change that ;) that's why I'd rather have a release now with what we have instead of wait for the perfect release
<hkaiser>
I agree
<simbergm>
thanks btw for proposing and agreeing on a date :)
<hkaiser>
lol
<hkaiser>
any time
<jbjnr>
simbergm: hkaiser my scheduler changes are going to be quite a big set of changes, cos there is all the schedule_hint stuff for numa awareness and also the cleanuops that heller has been working on , and now my removal of wait_or_Add_new etc etc. best if we do a 1.2 release with that stuff in and change the default scheduler etc etc if possible.
<hkaiser>
ok
<simbergm>
sounds good to me
<jbjnr>
hkaiser: I am going to write up my executor stuff and submit an ISO paper to the c++ people
<hkaiser>
good
<jbjnr>
the issue we contributed to is flagged as "separate paper", so ...
<jbjnr>
I think this will also make a great cppcon talk
<hkaiser>
they do that whenever they don't want to respond
<jbjnr>
the cholesky work
<jbjnr>
lol
<hkaiser>
these things would require principal changes to the executor design they have today
<jbjnr>
yup
<jbjnr>
and that's whay I need to write it up formally
<jbjnr>
The june meeting is in CH, so if I could do it in time for that ....
<hkaiser>
yah, you could even present it yourself
<jbjnr>
simbergm: also - you probably saw that all my tests are failing - so I must have broken something fairly serious deep inside hpx somewhere
<simbergm>
yep, the undefined symbol errors are odd too, probably doesn't help
<simbergm>
I'll get started then
<jbjnr>
I wonder if I should make a PR out of that branch, just to see whatt tests pass/fail on daint? does anyone mind if I pollute the PRs with one that's not ready yet?
<github>
[hpx] biddisco pushed 1 new commit to guided_pool_executor: https://git.io/vxkLN
<github>
hpx/guided_pool_executor d848a8b John Biddiscombe: Fix test to schedule_hint correctly
<jbjnr>
simbergm: pushed another test compile fix. There are still a few tests that fail to build, but I'll fix them later. (just do a make -k when you need to build the lot and not stop on errors)
<jbjnr>
the the parcel coalescing fix, the distributed tests are now all ok
<simbergm>
jbjnr: yep, it's hanging on executor_parameters for me, but the others (rp, async, guided_pool) passed at least once in debug
<jbjnr>
but removing wait_or_add_new had no significant side effects
<simbergm>
jbjnr: are you able to run cholesky with that branch?
<jbjnr>
task queue is removed. Might have some bits left over in the code, but everything seems to be ok
<jbjnr>
yes. ran cholesky today
<simbergm>
okay, good
<simbergm>
also, what was the conclusion after running vtune? what's the proportion of gemm vs non-gemm time?
<simbergm>
does it account for the difference to parsec?
<heller_>
jbjnr: figured it didn't have *much* difference
<jbjnr>
didn't make much difference afaict - still slower than parsec for small blocks. Have to investigate cache reuse on each core ...
<heller_>
jbjnr: I guess it'll change once thread stacks are reused properly. And of course, a better queue to begin with
<jbjnr>
I will rerun vtune on the new branch though ... would like to see what;s the new hotspot
<jbjnr>
got to do slides today though :(
<jbjnr>
mind you, the tests on daint were ok, only 4 fails and 4 not built. pretty cool I'd asay, considering the scale of changes I've made on my brnach
<jbjnr>
ooh simbergm you fixed the compilation fails. lovely thanks
<jbjnr>
the get_stacksize error will be a problem. That should not happen, that means a non-thread executor is using the thread executor api. I will investigate it tomorrow (perhaps a trait is wrong somewhere)
<simbergm>
jbjnr: not all of them I think (pycicle is working on it...)
<simbergm>
now executor_parameters doesn't hang anymore, grr
<jbjnr>
yup. the stacksize ones will fail still
<simbergm>
ah, ok
<jbjnr>
(I can't quite believe that it's al nearly working).
diehlpk_mobile2 has joined #ste||ar
diehlpk_mobile has quit [Ping timeout: 268 seconds]
diehlpk_mobile2 has quit [Read error: Connection reset by peer]
diehlpk_mobile has joined #ste||ar
diehlpk_mobile2 has joined #ste||ar
diehlpk_mobile has quit [Read error: Connection reset by peer]
diehlpk_mobile2 has quit [Read error: Connection reset by peer]
<heller_>
I find the ability to include stuff quite cool
<diehlpk_work>
Yes
<diehlpk_work>
Having python bindings for HPX would be cool
jaafar_ has joined #ste||ar
jaafar has quit [Ping timeout: 240 seconds]
mcopik_ has quit [Ping timeout: 264 seconds]
Viraj has joined #ste||ar
Viraj has quit [Ping timeout: 260 seconds]
Viraj has joined #ste||ar
Viraj has quit [Quit: Page closed]
Viraj has joined #ste||ar
Viraj has quit [Quit: Page closed]
hkaiser has joined #ste||ar
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
diehlpk_mobile has quit [Quit: Yaaic - Yet another Android IRC client - http://www.yaaic.org]
Anushi1998 has joined #ste||ar
aserio has quit [Ping timeout: 276 seconds]
<Anushi1998>
hkaiser: Will you please look at my mail
<hkaiser>
Anushi1998: when did you send it?
<Anushi1998>
3hrs ago
<hkaiser>
Anushi1998: ok, got it
<Anushi1998>
thanks :)
<hkaiser>
John already responded
<Anushi1998>
ok
<zao>
jbjnr: Got any funny branch to test, mine was just master.
mcopik_ has joined #ste||ar
hkaiser has quit [Quit: bye]
Smasher has joined #ste||ar
aserio has joined #ste||ar
<nikunj_>
simbergm: I have a question regarding registration of threads, good time to ask?
<simbergm>
nikunj_: ask away, but I admit I've never used that feature
<nikunj_>
simbergm: When we register a kernel thread with HPX runtime, does that provide every functionality of HPX to get invoked in that function?
<nikunj_>
simbergm: From my experimentation, I see that features like threads and futures return a null id exception when a OS kernel thread is registered with HPX runtime. I want to confirm if it's meant to behave that way or is it a bug?
<simbergm>
nikunj_: in my understanding at least most features should be available (that's the point of registering), but there may be exceptions
<simbergm>
so you're trying to e.g. get a future or something like that?
<nikunj_>
simbergm: By null id exception I mean, creating a hpx::thread from the os thread ( registered with hpx runtime ) to run another function.
<nikunj_>
simbergm: yes I'm trying to get a hpx::future from an os thread (registered with hpx runtime), but it returns a null id exception
<simbergm>
it sounds buggy to me, but I'm not 100% sure
<simbergm>
you could open an issue with a small reproducing example
<nikunj_>
simbergm: ok, I'll experiment a bit more first just to be sure. These null id's are coming with init_globally, I've not tried with normal hpx runtime system. It might be problematic initialization sequence that caused it, but that's just a guess.
<nikunj_>
simbergm: I'll confirm with the current hpx runtime system before opening an issue
<simbergm>
just looking quickly at the comments in init_globally it might be that not everything is available
<simbergm>
yeah, play around a bit more, sorry I can't give you a better answer
<nikunj_>
simbergm: oh, I forgot to read the comments. Thanks.
<nikunj_>
simbergm: Now, I understand why it won't do it.
<simbergm>
that's probably the best documentation we have on it... if you find something out and feel like fleshing out the docs for (un)register_thread that would of course be appreciated
<nikunj_>
simbergm: Does hpx currently have a registration system where we could invoke all functionalities of hpx?
<nikunj_>
simbergm: sure i can work on it.
<nikunj_>
I'm currently learning quite a bit about the runtime system to clear up my mind so that I could write proper proposal.
<simbergm>
I think not, register_thread is what we have and I assume there is nothing else, but as I said I don't know those parts too well
<simbergm>
I guess you're aiming at making main somehow run in the runtime?
<nikunj_>
simbergm: yes
<nikunj_>
simbergm: I have found a way but I don't think if it's implementable
<simbergm>
mmh, the code and the internet are your best friends here unfortunately (and hkaiser and heller)
<nikunj_>
It includes calling main from the hpx_main to run as hpx_thread (it will then be able to provide everything) and then simply not letting the compiler invoke the main function and directly get it to exit
<nikunj_>
i.e. call the .fini section of the binary to initialize the end sequence
<nikunj_>
This way the main function will only be called once and that too with all hpx functionalities
<nikunj_>
I will then have to implement the initialization sequence properly to prevent any mishap
<simbergm>
that would indeed be nice, but I can't comment on the technical feasibility of it
<simbergm>
yep, you'll likely have to battle with differences between compilers and platforms
<simbergm>
I have to go though, I hope that was at least a tiny bit helpful and good luck with the proposal
<nikunj_>
simbergm: ok, thanks for the help
Anushi1998 has quit [Read error: Connection reset by peer]
aserio has quit [Ping timeout: 276 seconds]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
Anushi1998 has joined #ste||ar
Anushi1998 has quit [Client Quit]
Anushi1998 has joined #ste||ar
EverYoung has joined #ste||ar
david_pfander has quit [Ping timeout: 264 seconds]
hkaiser has joined #ste||ar
hkaiser has quit [Client Quit]
hkaiser has joined #ste||ar
Anushi1998 has quit [Quit: Leaving]
Anushi1998 has joined #ste||ar
<nikunj_>
@hkaiser: I need to discuss a few things regarding implementing my project, good time to ask?
akheir has joined #ste||ar
<hkaiser>
nikunj_: sure
<nikunj_>
@hkaiser: So I had a quick overview of how the hpx runtime is working by checking the source code.
Anushi1998 has quit [Quit: Leaving]
Anushi1998 has joined #ste||ar
<hkaiser>
k
<nikunj_>
@hkaiser: the start function initializes the hpx runtime but we can provide the location to start, is this correct?
<nikunj_>
Also, start does not suspend the current thread ( as you told )
<hkaiser>
yes
<nikunj_>
However, with init the runtime always starts with hpx_main, right?
<hkaiser>
no
<hkaiser>
there are hpx_init overloads allowing to specify a function
<nikunj_>
So we can provide the function with init as well?
<hkaiser>
yes
<nikunj_>
That will make it easier for me to implement the runtime system in that case.
<hkaiser>
k
<nikunj_>
So I was thinking of actually replacing the start with init
<nikunj_>
in init_globally
<nikunj_>
this will start the runtime system and also suspend the thread
<nikunj_>
And now we can provide the main function with init
<nikunj_>
this will directly register the main thread as an hpx thread
<nikunj_>
After all the work is done, we can simpy call the std::exit which will then initialize the destructors
<hkaiser>
might work, try it
<nikunj_>
Yes, I was confused if init could take a function.
<nikunj_>
That's why I had to make sure
<nikunj_>
Thanks for the help, I'll try implementing it
<nikunj_>
@hkaiser: it works!
<nikunj_>
Now, I need to work around hpx::resource::get_partitioner() can be called only after the resource partitioner has been allowed to parse the command line options.
<nikunj_>
@hkaiser: I have finally been able to run main on an hpx thread. It can now run futures and async in main function as well.
<K-ballo>
woa, run main on an hpx thread?
<nikunj_>
I have a very basic implementation (without destructors and maintaining the initialization sequence). Things seems to run fine as well.
<nikunj_>
It is a very basic one and needs a lot of refinement, which I can work on
<K-ballo>
oh cool
<K-ballo>
so you hijack main from global initialization
<K-ballo>
and technically, main never actually runs :)
<nikunj_>
@K-ballo: yes, exactly
aserio has joined #ste||ar
<zao>
What about other global init, and if you're a DLL?
<zao>
Guessing this is quite narrow in scope by design.
<nikunj_>
zao: It is a very basic implementation
<nikunj_>
So basic that it doesn't even takle an apple based system
<nikunj_>
tackle*
<nikunj_>
I can work on implementing this
<K-ballo>
hpx threads in other global inits should already be a big no
akheir has quit [Remote host closed the connection]
anushi has joined #ste||ar
Anushi1998 has joined #ste||ar
aserio has quit [Read error: Connection reset by peer]
aserio has joined #ste||ar
<zao>
jbjnr: New results! Over 7700-some runs, tests.unit.parallel.segmented_algorithms.distributed.tcp.partitioned_vector_iter failed five times with 1500 second timeout.
<zao>
I don't have any XML/output for them, sadly.
aserio has quit [Ping timeout: 246 seconds]
EverYoung has quit [Read error: Connection reset by peer]
jakub_golinowski has quit [Ping timeout: 260 seconds]
nikunj_ has quit [Quit: Page closed]
<jbjnr>
zao: do we expect that test to fail - or is this evidence that there is a problem deep inside hpx somewhere we still haven't fixed - my distributed tests seem ok after I fixed the plugin stuff.
<jbjnr>
I'll run it here a few hundred times and see if it fails
<zao>
I have not run any other tests than this subset, so no idea of the overall state of HPX.
hkaiser has quit [Quit: bye]
<zao>
This seems to be running on eight cores here, btw.
eschnett has quit [Quit: eschnett]
nikunj has joined #ste||ar
<diehlpk_work>
Does anyone has an idea why hpx::parallel::dynamic_chunk_size cs(1); does result in a parallel for loop for one specific input size very low and for other sizes as the compared benchmark
hkaiser has joined #ste||ar
aserio has quit [Quit: aserio]
<nikunj>
@hkaiser: What do you think of my implementation above?
nikunj has quit [Quit: Page closed]
diehlpk_work has quit [Quit: Leaving]
diehlpk_mobile has joined #ste||ar
diehlpk_mobile has quit [Read error: Connection reset by peer]
daissgr has joined #ste||ar
<zao>
Feh, got a timeout again, but no output captured in the XML.
daissgr has quit [Ping timeout: 246 seconds]
daissgr has joined #ste||ar
diehlpk_mobile has joined #ste||ar
jbjnr has quit [Read error: Connection reset by peer]
diehlpk_mobile has quit [Ping timeout: 276 seconds]
diehlpk_mobile has joined #ste||ar
diehlpk_mobile2 has joined #ste||ar
jaafar_ has quit [Ping timeout: 240 seconds]
diehlpk_mobile has quit [Ping timeout: 260 seconds]
daissgr has quit [Ping timeout: 276 seconds]
diehlpk_mobile2 has quit [Ping timeout: 260 seconds]