#ste||ar on 2017-10-29 — irc logs at irclog.cct.lsu.edu

2017-05-17 13:54 aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/

00:50 jaafar has quit [Ping timeout: 240 seconds]

01:26 gedaj has quit [Read error: Connection reset by peer]

01:29 gedaj has joined #ste||ar

01:47 hkaiser has quit [Quit: bye]

03:00 patg has quit [Quit: See you later]

03:18 gedaj has quit [Quit: Leaving]

03:19 gedaj has joined #ste||ar

03:22 gedaj has quit [Client Quit]

03:22 gedaj has joined #ste||ar

03:25 EverYoung has quit [Remote host closed the connection]

03:46 gedaj has quit [Quit: Leaving]

04:14 gedaj has joined #ste||ar

04:14 gedaj has quit [Changing host]

04:14 gedaj has joined #ste||ar

04:15 gedaj has quit [Client Quit]

04:18 gedaj has joined #ste||ar

04:18 gedaj has quit [Changing host]

04:18 gedaj has joined #ste||ar

04:22 zbyerly_ has left #ste||ar ["Leaving"]

04:25 gedaj has quit [Quit: leaving]

04:26 gedaj has joined #ste||ar

04:37 parsa has quit [Quit: Zzzzzzzzzzzz]

04:40 parsa has joined #ste||ar

04:52 parsa has quit [Quit: Zzzzzzzzzzzz]

04:53 parsa has joined #ste||ar

05:22 parsa has quit [Quit: Zzzzzzzzzzzz]

05:22 parsa has joined #ste||ar

05:23 parsa has quit [Client Quit]

05:23 parsa has joined #ste||ar

05:23 parsa has quit [Client Quit]

05:24 parsa has joined #ste||ar

05:24 parsa has quit [Client Quit]

05:25 parsa has joined #ste||ar

05:25 parsa has quit [Client Quit]

05:25 parsa has joined #ste||ar

05:25 EverYoung has joined #ste||ar

05:26 parsa has quit [Client Quit]

05:30 EverYoung has quit [Ping timeout: 252 seconds]

09:07 <github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/vFIhv

09:07 <github> hpx/gh-pages a3d5c0d StellarBot: Updating docs

09:28 EverYoung has joined #ste||ar

09:34 EverYoung has quit [Ping timeout: 258 seconds]

11:31 EverYoung has joined #ste||ar

11:35 EverYoung has quit [Ping timeout: 252 seconds]

13:36 eschnett has quit [Quit: eschnett]

14:02 lucg has joined #ste||ar

14:04 eschnett has joined #ste||ar

14:10 hkaiser has joined #ste||ar

14:15 EverYoung has joined #ste||ar

14:19 EverYoung has quit [Ping timeout: 246 seconds]

14:21 parsa has joined #ste||ar

14:37 lucg has left #ste||ar [#ste||ar]

14:40 <github> [hpx] hkaiser force-pushed local_new_fallback from 4ff1464 to 6fec0a9: https://git.io/vFI3A

14:40 <github> hpx/local_new_fallback 6fec0a9 Hartmut Kaiser: Fall back to creating local components using local_new...

16:16 EverYoung has joined #ste||ar

16:23 EverYoung has quit [Ping timeout: 258 seconds]

16:23 EverYoung has joined #ste||ar

16:23 EverYoung has quit [Remote host closed the connection]

16:24 EverYoung has joined #ste||ar

16:30 EverYoun_ has joined #ste||ar

16:33 EverYoung has quit [Ping timeout: 255 seconds]

16:34 EverYoun_ has quit [Client Quit]

17:00 <parsa> hkaiser: when phylanx and any one the test apps are linked with mkl or openblas on windows, they seem to run without doing anything... even hpx doesn't start

17:01 <hkaiser> lol

17:03 <hkaiser> parsa: you need to make sure to link with the single-threaded version of mkl

17:04 <parsa> i assumed sequential is single threaded (only other options are parallel and cluster)

17:05 <hkaiser> nod, correct

17:33 <parsa> well this is awful... i don't even have a clue where to look

17:44 <hkaiser> let's debug this...

17:52 jaafar has joined #ste||ar

18:26 <github> [hpx] hkaiser force-pushed local_new_fallback from 432d26d to 13880dd: https://git.io/vFI3A

18:26 <github> hpx/local_new_fallback 13880dd Hartmut Kaiser: Fall back to creating local components using local_new...

18:28 <github> [hpx] hkaiser pushed 1 new commit to master: https://git.io/vFLuE

18:28 <github> hpx/master c6406e6 Hartmut Kaiser: Merge pull request #2976 from STEllAR-GROUP/client_base_registration...

18:28 <github> [hpx] hkaiser deleted client_base_registration at 204a1a5: https://git.io/vFLuu

18:29 wash has quit [Quit: leaving]

18:39 <github> [hpx] K-ballo created then-fwd-future (+1 new commit): https://git.io/vFLzt

18:39 <github> hpx/then-fwd-future b5a3c9c Agustin K-ballo Berge: Forward future used with .then() to continuation

18:39 <K-ballo> ^ did that but I doubt is the right approach, might be better to always pass rvalue futures to the continuation even if it implies extra refcounts for shared future

18:40 * K-ballo is leaving for the airport now

19:36 wash has joined #ste||ar

20:09 <github> [hpx] hkaiser pushed 1 new commit to master: https://git.io/vFLwD

20:09 <github> hpx/master 0ad76de Hartmut Kaiser: Adding closed tickets to docs

20:14 gedaj has quit [Quit: leaving]

20:16 gedaj has joined #ste||ar

20:19 wash has quit [Quit: leaving]

20:23 wash has joined #ste||ar

20:41 ct-clmsn has joined #ste||ar

20:54 <ct-clmsn> hkaiser: where did you want the preliminary tree code?

20:54 <ct-clmsn> hkaiser: there isn't much available at the moment, i'm going through the kurt implmentation and filling in missing methods

21:00 <hkaiser> ct-clmsn: anywhere you like

21:05 <ct-clmsn> @hkaiser: completely unrelated question - if ya'll were to provide collectives, would those be provided as algorithms over data structures or something more runtime-level?

21:05 mbremer has joined #ste||ar

21:09 <hkaiser> ct-clmsn: we have some of the collectives

21:09 <hkaiser> they are currently control-structures, unrelated to data

21:09 <ct-clmsn> @hkaiser: ah so in parcelport? i saw the allgather in examples

21:10 <hkaiser> those are not related/connected to parcelports

21:11 <ct-clmsn> @hkaiser: rgr

21:11 <hkaiser> see here for broadcast, for instance: https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/lcos/broadcast.hpp

21:12 <hkaiser> but it might be beneficial to tie thm into the parcelports to take advantage of functionalities in the network layer

21:12 <hkaiser> ct-clmsn: btw

21:12 <hkaiser> ct-clmsn: the stack-overflow detection you contributed - it also kicks in for 'real' segfaults

21:12 <hkaiser> would you see a way to distinguish those cases?

21:13 <hkaiser> it's currently producing misleading error messages ...

21:13 <ct-clmsn> @hkaiser: ah they're in the lcos. now that's an interesting issue, will look into it

21:14 <hkaiser> ct-clmsn: yah, lcos - we don't have a full set of collectives, though - we miss allgather, scatter, alltoall, possibly more

21:14 <heller> Hmm, I'm still not sure if the parcelport is really the right place

21:14 <hkaiser> heller: not sure either...

21:14 <heller> But it really depends on which route we want to go

21:14 EverYoung has joined #ste||ar

21:14 <ct-clmsn> @hkaiser: cool, that might be something i'll look into when after this matrix stuff is complete - blaze is pretty comprehensive

21:15 EverYoung has quit [Remote host closed the connection]

21:15 <heller> I looked into the stuff that John mentioned the other day

21:16 EverYoung has joined #ste||ar

21:16 <heller> A library "close to the metal" for collectives for mpi from ethz

21:17 <ct-clmsn> @heller: something like gasnet?

21:17 <heller> There really isn't a lot that takes advantage of any hardware. All software tree based. Which is exactly what we do as well

21:17 <ct-clmsn> @heller: well, maybe not gasnet exactly...

21:17 <heller> Really depends...

21:18 <heller> We have components, which is a far more dynamic thing then a set of network endpoints to talk to

21:18 <heller> People tend to don't like that

21:19 <ct-clmsn> @heller: rgr

21:19 <hkaiser> heller: because that's something nobody else has

21:19 <heller> Because it sounds scary and too much overhead. At the end of the day, it's exactly the same

21:19 <hkaiser> nod

21:19 <hkaiser> people are not used to think it terms of objects

21:19 <heller> Yes

21:20 <ct-clmsn> @hkaiser: so the signal handler that manages coroutine overflows is registered to the process-level segfault signal SIGSEGV will see if there's anyway to mask it

21:20 <heller> So I'm pretty sure it does not belong into the parcelport ;)

21:20 <heller> What we'd need is to define proper communicators or similar

21:20 <ct-clmsn> @heller @hkaiser so lcos

21:20 <ct-clmsn> for collectives?

21:21 <heller> The basename thingy is a nice start there, but not really nice to use our enough

21:21 <heller> ct-clmsn: yes, in my book at least

21:22 <heller> Based on proper communicators we can have all those nice and fancy collectives people want

21:23 <heller> With same or better performance

21:23 <hkaiser> ct-clmsn: you might have to look whether the 'segfault' occurred near the current stack frame

21:25 <hkaiser> nod

21:25 <hkaiser> just don't call then 'communicators' ;)

21:25 <ct-clmsn> @hkaiser: i could look at the address of the context that was created and see if it matches the context that was attached to the signal handler as an arg

21:25 <ct-clmsn> pardon

21:26 <ct-clmsn> @hkaiser: the context that threw the signal handler vs the context registerd in the overflow hndler

21:26 <ct-clmsn> handler

21:27 <hkaiser> ct-clmsn: nod, could work

21:27 <heller> hkaiser: why not communicators? Why allieniate users with a new term?

21:28 <hkaiser> is there a way to 'let it fall through' ?

21:28 <hkaiser> ;)

21:28 <ct-clmsn> yep - test the addresses and fall through if they don't match

21:28 <hkaiser> heller: if we want people to understand that it's not mpi we shouldn't use mpi terminology

21:28 <hkaiser> ct-clmsn: perfect

21:28 <ct-clmsn> not sure which address to test...

21:28 <ct-clmsn> lol

21:29 <hkaiser> isn't the signal handler with the address causing the issue?

21:29 <hkaiser> invoked with*

21:30 <heller> I'd look at it to overtake mpi terminology ;)

21:30 <hkaiser> right

21:30 <ct-clmsn> so when SIGSEGV is called, you get a struct that tells you which address broke things

21:30 <ct-clmsn> in the callback

21:30 <ct-clmsn> you can register the callback with some data that gets passed in when the callback is executed

21:30 <hkaiser> ct-clmsn: if that's close to the range of the current stack segment you got yourself a stack-overflow

21:30 <ct-clmsn> i *think* you can compare the registered data with the dynamically provided data

21:31 <ct-clmsn> bingo

21:31 <hkaiser> whatever 'close' means ;)

21:31 <ct-clmsn> yes...

21:31 <ct-clmsn> lol

21:31 <ct-clmsn> how does "close enough" sound?

21:31 <hkaiser> right

21:33 <ct-clmsn> would a macro for the epsilon be ok for now?

21:33 mbremer has quit [Ping timeout: 260 seconds]

21:34 <heller> Better than what we have now

21:34 <heller> A segfault on a nullptr for example ;)

21:35 <ct-clmsn> ah right, the simple case

21:35 <heller> Dangling pointers are still interpreted as stack overflows?

21:35 <heller> That's the more interesting case...

21:41 diehlpk has joined #ste||ar

21:41 <wash> hkaiser, heller: What do we want to say in the rebuttal? the first reviewer seems to be asking if the framework is applicable to other applications

21:42 <hkaiser> I think we should treat it as a stack-overflow if the offending address is +/- 1k off the current stack frame

21:42 <hkaiser> but a macro for this is certainly fine

21:42 <hkaiser> wash: no idea - any suggestions?

21:43 <hkaiser> I think we have no chance to get accepted, no matter what we say

21:43 <heller> wash: we say yes, and cite the lgd and pgas paper

21:43 <hkaiser> it's a fairly highly competed conference after all

21:43 <heller> We just try to make the reviewers happy

21:43 <wash> @hkaiser We get additional reviewers in the second round

21:44 <wash> Not just the same people ranking us again

21:44 <heller> There are fair points.

21:44 <hkaiser> ok - I'd strongly object to the claim that our results are 'incremental'

21:44 <ct-clmsn> @hkaiser: testing our current sigsegv concept

21:44 <wash> I'm primarily interested in how you would characterize this paper's novel research contribution w.r.t. an ASPLOS setting. I'd also like to understand whether the programming model's support for object migration and automated load balancing came into play in this study. <-

21:44 <wash> that is from reviewer #2

21:44 <heller> So yes, the criticism that the point of the paper was to scale up the application and show that our approach is feasible

21:44 <hkaiser> they have not read the conclusions - that's where the meat is - we might have to make that more prominent

21:46 <hkaiser> we have shown for the first time that a) a tasking system can be used to scale very well and b) that everything can be done using a uniform high-level API

21:46 <wash> actually

21:46 <hkaiser> clcool

21:46 <heller> They unanimously claim we didn't do real science, just made the thing scale and therefore are not a good fit for the conference. Not a lot you can argue with there...

21:46 <hkaiser> ct-clmsn : cool

21:46 <wash> re-reading the original email from ASPLOS, it indicates that not all papers make it to the second round of reviews

21:46 <wash> and it seems we did.

21:47 <hkaiser> heller: right, that's what I meant - whatever we say it will not change the decision - the reviews don't characterize this peper as borderline

21:47 <hkaiser> wash: try it - by all means

21:47 <heller> We should absolutely

21:47 <hkaiser> I'm too angry to be useful

21:47 <heller> Would be stupid but to

21:47 <heller> not*

21:48 <heller> When do they want our answers?

21:48 <hkaiser> today

21:49 <wash> yah

21:49 <heller> Oh, ok

21:50 <wash> Do we make use of migration in octotiger? We do, yes?

21:50 <heller> Why do I always mess up deadlines recently :/

21:50 <heller> We don't

21:53 <ct-clmsn> @hkaiser: doing an hpx compile going afk

21:54 <ct-clmsn> @hkaiser: will look at merging the tree transducer in it's current state into your transducer work

21:55 <ct-clmsn> later tonight

21:55 <hkaiser> ct-clmsn: great, pls ask if something is not clear

21:55 <ct-clmsn> will do

21:55 <hkaiser> ct-clmsn: pls also assume I did it all wrong

21:56 <ct-clmsn> @hkaiser: i'm still learning the kurt implementation and how it relates to the paper so...it'll be a learning experience for everyone!

21:58 <wash> "The framework implementation consists of multiple techniques: future in new C++ parallel model, AGAS for workload balancing, and vectorization supported from Vc library. I’m not sure which

21:58 <wash> techniques are developed specifically by the authors for this work."

21:58 <wash> How do we respond to that?

21:58 <wash> Reviewer #1 is basically asking "is HPX part of this work", I think.

21:58 <hkaiser> it sure wasn't developed exclusively for this work

21:58 <wash> How about "co-designed"?

21:58 <hkaiser> hpx is a general purpose runtime system

21:59 <hkaiser> the application was definitely a driving use case

21:59 <hkaiser> a means of verification of the implemented concepts

21:59 <wash> hkaiser: reviewer #1 seems to be asking whether or not we developed HPX, I think, or whether we just developed the simulation.

22:00 <hkaiser> both

22:00 <hkaiser> what do they know

22:13 diehlpk has quit [Ping timeout: 258 seconds]

22:37 <ct-clmsn> @hkaiser: can't find the tree directory in the transform_ast branch

22:37 <ct-clmsn> @hkaiser: (or blob)

22:39 <hkaiser> ct-clmsn: sec

22:40 <ct-clmsn> @hkaiser: wasn't sure if those files were pushed

22:40 <hkaiser> ct-clmsn: iit's all in this file: https://github.com/STEllAR-GROUP/phylanx/blob/transform_ast/phylanx/ast/transform_ast.hpp

22:40 <ct-clmsn> @hkaiser: ah cool thanks

22:40 <hkaiser> hold on, this one:https://github.com/STEllAR-GROUP/phylanx/blob/transform_ast/src/ast/transform_ast.cpp

22:41 <hkaiser> ct-clmsn: ^^

22:42 <ct-clmsn> @hkaiser: hmm. i'm on transfer_ast branch, and transform_ast.cpp is in ./src/ast/transform_ast.cpp

22:42 <hkaiser> yes

22:43 <ct-clmsn> ah ok

22:43 <ct-clmsn> whew

22:43 <ct-clmsn> phylanx/blob was throwing me off

22:43 <hkaiser> it's what github gives me ;)

22:44 <ct-clmsn> cool is a weighted transducer ok? we can set the weights to 1.0 to make it a non-stochastic transducer

22:45 <ct-clmsn> or would you rather keep it simple to avoid getting things overly complicated

22:45 <hkaiser> sure, whatever is needed - the current code has no weights, though

22:45 <ct-clmsn> np

22:45 <ct-clmsn> will look into it

22:45 <hkaiser> does not allow for different options

22:45 <ct-clmsn> there's some statistics sauce that would need to be added in

22:45 <hkaiser> nod

22:46 <hkaiser> what would that be useful for?

22:46 <ct-clmsn> this approach helps with svz's work

22:46 <ct-clmsn> gives him some weights to work with on the tree

22:46 <ct-clmsn> bbiab

23:15 <wash> I'm sending a draft

23:19 <wash> ct-clmsn: btw, have you heard about our (nvidias) container launch?

23:40 <ct-clmsn> @wash: not yest

23:40 <ct-clmsn> yet

23:40 <ct-clmsn> @wash: what's the story?

23:43 <wash> @ct-clmsn: if y'all still care about containerization, it may be worth checking out. We released a set of docker containers for various ML frameworks+CUDA last week

23:44 <wash> I recall you had to deal with containers of some sort :p

23:45 <wash> @ct-clmsn: https://www.nvidia.com/en-us/gpu-cloud/ <- (the site is kinda high-level, I'm sure we have an easier way for people to just yank the images; it's designed to deploy on either amazon instances, or your own hardware atm)