aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
eschnett has joined #ste||ar
vamatya has quit [Ping timeout: 260 seconds]
aserio has joined #ste||ar
hkaiser has joined #ste||ar
hkaiser has quit [Quit: bye]
aserio has quit [Ping timeout: 255 seconds]
mcopik_ has joined #ste||ar
aserio has joined #ste||ar
ajaivgeorge has joined #ste||ar
ajaivgeorge_ has quit [Ping timeout: 246 seconds]
ajaivgeorge has quit [Ping timeout: 260 seconds]
hkaiser has joined #ste||ar
aserio has quit [Quit: aserio]
K-ballo has quit [Quit: K-ballo]
mcopik_ has quit [Ping timeout: 255 seconds]
hkaiser has quit [Quit: bye]
vamatya has joined #ste||ar
mcopik_ has joined #ste||ar
mcopik_ has quit [Quit: Leaving]
jaafar has joined #ste||ar
vamatya has quit [Ping timeout: 240 seconds]
jaafar has quit [Quit: Konversation terminated!]
pree has joined #ste||ar
<pree> parsa[w] : yt ?
<pree> Did you read the gist ?
Matombo has joined #ste||ar
Matombo has quit [Ping timeout: 240 seconds]
<jbjnr> pree: yt?
<pree> Yeah I'm here
<jbjnr> Today is a holiday in switzerland, so I'm not in the office , but I just read your gist....
<pree> Okay Thank you
<jbjnr> I don't think I could make comments easily over irc/etc but if you want to do a quick video call, I'm happy to chat for 10 mins
<jbjnr> google hangout ?
<pree> Extremely Sorry I can't do a video call due to poor internet connection in my town. That's why I didn't make a call with my mentors
<pree> Only through mail && IRC
<pree> Sorry
<jbjnr> ok. I don't fully understand/know what it is you actually want to achieve ? I should read your gsoc proposal probably ...
<pree> You have it or I have to sent it Sir ??
<jbjnr> link?
<pree> okay
<jbjnr> do you want to distribute a partitioned vector - or do you want to distribute some other object(s) - and once you've distributed them - what will they be used for?
<pree> link to my proposal
<pree> No I want to distribute Components over localities
<pree> But in propasal I mostly wrote it for data distribution. But it can be done for components just like what I have did in the proposal
<jbjnr> reading proposal now ....
<pree> okay
bikineev has joined #ste||ar
<jbjnr> pree: ok ask me a question now !
<jbjnr> well, I should ask you ...
<pree> well , it is correct ! what I described in gist
<pree> This is what I want to implement in summer
<jbjnr> I'm still confused though. A domain represents a set of (say) array bounds - and a domain map represents a way of mapping regions n the domain to localities?
<pree> Don't consider that way of representation it is specific to chapel, I refered that bcoz it was in the project idea description
<jbjnr> because you say that you want to distribute components over localites - and not data over localities - but really a components is just a convenience object to access data
<jbjnr> in the sense that a componens is just an object that might 'own' some data
<pree> Yes, But implementing domain & domain map in HPX is not so useful i think
<jbjnr> le me look at the gist agian
<pree> I saw component as a worker, which do some work remotely
<pree> okay !
<jbjnr> yes. think of it as a worked indeed
<jbjnr> ^worker
<jbjnr> can you give me an example of how you would use the stuff you want to write?
<jbjnr> then I might understand better exactly what you wish to achive
Matombo has joined #ste||ar
<pree> As a example, I have taken the policies from the chapel, which is used to distribute the domains over localities
<pree> And I wish to create components in that style over the localities and manage them throught the application
<pree> through the single representative object
<pree> with this representative notation we can handle many components easy with less code
<jbjnr> so I would create a single instance of a 'distributed matrix' component (thing) and it would internally manage a set of matrix-worker objects distributed over nodes?
<jbjnr> (localities)
<pree> yes, it's the idea I have
<pree> Calling a action for the representative means ,Internally you are calling the action on each worker components
<jbjnr> and if I want to perform some operation on a tile(piece) of the matrix, then I would invoke that function on the single agas handle and it would shiip that operation to each of the localities holding parts of the matrix that are distributed over those workers
<jbjnr> ^yes
<jbjnr> I see
bikineev has quit [Remote host closed the connection]
bikineev has joined #ste||ar
<jbjnr> pree: we need a test case for this - a simple application that would benefit from this - that will drive your design with more focus - I can see a danger that you would create the structures needed for this, but have no 'use case' for it. Do you have an application in mind (a test code) tht we could implement using this distribution / access scheme
<jbjnr> somethig simple preferebly
<heller_> I think that approach is too narrow
<github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/vHbgV
<github> hpx/gh-pages 5f0686a StellarBot: Updating docs
<pree> I don't have one yet, I just want to know whether this approach is okay, For use cases I want to think
<pree> heller_ :please explain ?
<heller_> I would have really preferred to see generic domains and domain maps in HPX which can then be used on other generic containers, such as partitioned vector or your matrix thing
bikineev has quit [Ping timeout: 240 seconds]
<jbjnr> (distributed matrices would be very useful)
<heller_> Chapel defined this generic concept such that it can be used uniformly and with the ability to have user defined mappings etc
<heller_> jbjnr: they would, indeed, but it's a different project, imho
<jbjnr> I know nothing about this project and am trying to understand the motivation behind it at the moment
<pree> Okay heller_ : I can understand your idea
<heller_> I think I wrote the initial project idea description
<jbjnr> where is that?
<heller_> In the wiki
<pree> jbjnr link to project
<pree> heller_ : As you said I preferably go with domain implementation in hpx
<pree> As generic , it should work with all objects
<pree> Like arrays, containers, components in hpx
<pree> heller_ Okay I am forwarding with implementation of index-sets in HPX
<jbjnr> why is heller_ not mentoring this project then?
<pree> jbjnr : I don't know
<jbjnr> :)
<pree> :(
<pree> A whole lot of confusion at first , whether it should done similar to existing policies or something different to support components only
<jbjnr> from the original proposal, it looks like you need to focus on a way of giving the user a way of describing 'policies' to do distributions - and once those policies are in place, the existing framework can actually do the component creation an placement .
<pree> But know I think I'm clear that it should work with all objects -- containers, components, etc
<jbjnr> and if those policies look like domain maps - then fine
<pree> At first want to implement some policies
<pree> and only I can think about user-defined one
<pree> jbjnr ^^
<jbjnr> so looking again at your gist - you appear to be proposing a 'meta object' that controls (and owns) the component generation - when really you want only a hook into the existing component generation?
<jbjnr> perhaps heller_ can comment some more
<jbjnr> ^hint hint - but he's a bit busy these days
<pree> I know many hpx people are busy
<pree> :)
<jbjnr> there are too few of us and too much work to do
<heller_> jbjnr: because heller needs to finish his thesis
<pree> okay I understand
<heller_> pree: ^^
<pree> Copy that
<heller_> who is mentoring now?
<jbjnr> parsa[w]: ^
<pree> parsa & grubel
<heller_> ahh
<jbjnr> <sigh>
<pree> yes
<pree> jbjnr <sigh> ??
<jbjnr> well, they are not 'very' active on irc etc, so need a bit more chasing
<heller_> yeah
<jbjnr> my <sigh> is a kind of shrug really
<pree> jbjnr : Oh a lot more chasing :)
<heller_> pree: you have to activate them ;)
<heller_> feel free to have ideas bounced around here and have present them
<jbjnr> heller_: he's doing that^
<jbjnr> he's writtena gist and asked for coomments
<heller_> but respect their opinion, that is, we are not the definite authority ;)
<heller_> ah
<jbjnr> I just don't know much abut the project so was confused at first
<heller_> this is my first at a computer since a week!
<heller_> day*
<heller_> I even forgot how to type
<jbjnr> stop swanning about on holday and work harder!
<jbjnr> PS. GB news might not be until august btw
<pree> Okay, but getting replies occasionly
<heller_> jbjnr: woot?
<heller_> jbjnr: that sucks
<jbjnr> in 2015 and 2016 the finalist announcement was made in august
<heller_> do you know how many submissions?
<heller_> ugh
<heller_> the chair said to me it would be second week of june
<jbjnr> probably loads, and we have no chance anyway
Matombo has quit [Ping timeout: 240 seconds]
<jbjnr> they probabky go muddled up with the techincal papers announc
<heller_> isn't it a different commitee?
<jbjnr> the ACM handle it
<pree> sorry to interpret ! I'm going forward to implement domain (index-sets) first
<pree> jbjnr && heller_ ^^
<pree> Thank you for your response
<jbjnr> pree: keep working on it and we will try to think of ways to use it that will help
Matombo has joined #ste||ar
<jbjnr> are ou really foing a masters at just 18years of age?
<jbjnr> are ou really foing a masters at just 18years of age?^doing
<heller_> pree: I think chapel is a poster child in that regard here
<jbjnr> ^doing I tried to say
<jbjnr> my typing is shockingly bad
<pree> No it's a five year intregated course
<jbjnr> ok
<pree> 5 year Msc data science
<jbjnr> well this is great, co if you've taken on a project like this in year 1, we have 4 more years to get stuff out of you :)
<heller_> pree: also, remember that there are two sides to it. The one side is the user facing API, and the other is the implementation
<jbjnr> gtg. bbiav
<jbjnr> bbiab
<pree> As beginner I find it difficult. I will continue even if i failed, so i can get it next year : )
<heller_> pree: the user facing API could be indeed something like a expression template based API to express the distribution intent. But it then has to map onto a specific implementation
<heller_> there is a difference between failure and not achieving your goals
<pree> ?
<heller_> you can achieve your goals with failure as well
<heller_> but you have to admit failure first :P
<heller_> that is, you can easily pass GSoC, for example, without completing the work you proposed
<pree> I don't know what to say ! But I will do my best
<heller_> the magic here is that you have to describe what went wrong etc.
<pree> : )
<pree> okay heller_
<heller_> all in all, there is no progress without failure
<pree> : ) yeah i understand it
<pree> I doing user-defined one ,once I have got familiar with in-built's
<pree> heller_ ^^
<heller_> but yeah, since parsa[w] and pat need to evaluate you, you have to communicate it with them ;)
<pree> Okay I will
<pree> and hope to get one
<pree> Going for lunch. Bye
<heller_> pree: alrigth
<heller_> jbjnr: I have access to a 728 node IB cluster now
<jbjnr> heller_: where? Can I get an account too please :)
<heller_> jbjnr: sure
<heller_> jbjnr: see query
pree has quit [Ping timeout: 255 seconds]
<jbjnr> heller_: thanks. I will be back in the office on monday and fill in forms etc. awesome. Gives me an excuse to overhal the verbs PP and integrate it with the LF one nicely.
<jbjnr> and get all the rma stuff running etc.
<heller_> ;)
<jbjnr> ready for a nice IPDPS paper :) :0 :)
<heller_> jbjnr: I am still in favor of getting the connection based LF parcelport up and running again
<heller_> and let the IB one phase out
<jbjnr> could do that
<heller_> but your call
<jbjnr> but if we template most of the LF stuff, then adding verbs support would not be much effort
<heller_> just thinking out loud the potential maintenance overhead etc
<heller_> ok
<jbjnr> (I mean getting the verbs layer using most o the LF code)
<heller_> that'll be awesome
<jbjnr> then connection handling was the worst part
<jbjnr> ^the
<heller_> that way we might be able to adapt to other RDMA based technologies, easier
<jbjnr> yeah, we just need a templated customization point for the put/get/read/write poll etc much like I've done for the memory registration now
<heller_> jbjnr: we don't have a core hour limit on that cluster btw, we just might get throttled after a given number of hours
<jbjnr> the me reg is nice and now I can add rma types for other network layers
<heller_> but if noone else uses it...
<jbjnr> cool
<heller_> great
<jbjnr> In that case I'll start today!
<heller_> ;)
<jbjnr> no. I'll be playing overwatch and eating brownies today :)
<heller_> sounds like a plan
<heller_> I am sitting in the garden right now
<heller_> eating cake and drinking "radler"
<jbjnr> I just came back from the garden
<heller_> I got myself a power outlet in the garden ;)
<heller_> and super long range wifi
<jbjnr> got to fit a new toilet today
<jbjnr> so some work required
<heller_> hui
<heller_> sounds like fun as well
bikineev has joined #ste||ar
bikineev has quit [Remote host closed the connection]
pree has joined #ste||ar
bikineev has joined #ste||ar
K-ballo has joined #ste||ar
bikineev has quit [Ping timeout: 246 seconds]
hkaiser has joined #ste||ar
<7IZAANLGN> [hpx] hkaiser closed pull request #2684: Adding new statistics performance counters: (master...rolling_statistics_counters) https://git.io/vHoTl
<7GHAA7D3D> [hpx] hkaiser deleted rolling_statistics_counters at 65c8ec1: https://git.io/vHbSM
hkaiser has quit [Quit: bye]
eschnett has quit [Quit: eschnett]
patg has joined #ste||ar
patg is now known as Guest14201
hkaiser has joined #ste||ar
aserio has joined #ste||ar
jbjnr has quit [Remote host closed the connection]
ajaivgeorge has joined #ste||ar
eschnett has joined #ste||ar
ajaivgeorge has quit [Ping timeout: 258 seconds]
ajaivgeorge has joined #ste||ar
jbjnr_ has joined #ste||ar
ajaivgeorge has quit [Quit: ajaivgeorge]
<jbjnr_> hkaiser: got a moment (or 5)?
<hkaiser> jbjnr_: after the call, yes
<jbjnr_> call? oops
<jbjnr_> sorry forgot
<hkaiser> jbjnr_: ok, now
hkaiser has quit [Read error: Connection reset by peer]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
aserio has quit [Ping timeout: 246 seconds]
hkaiser has joined #ste||ar
<hkaiser> jbjnr_: ok, now
aserio has joined #ste||ar
<heller_> damn, forgot the call as well :(
<jbjnr_> hkaiser: backagain.
<jbjnr_> rings any bells in conjunction with my issue about executor compilation errors
<hkaiser> jbjnr_: have not looked yet, travelling...
<hkaiser> sorry
<hkaiser> I've seen the ticket
<jbjnr_> ok. Just wondered if you had any ideas I might try
<hkaiser> all I can say is that it should work :/
<jbjnr_> (clearly a missing specialization somwehere, but I'm not sure where to look)
<jbjnr_> nvm
<hkaiser> I had 2 talks to give yesterday and another today, after that I will be free to look
<jbjnr_> funding?
<jbjnr_> related I mean
<hkaiser> nah, informational
<jbjnr_> heller_: talking of funding - we need to collaborate officially on a project ....
<hkaiser> but yeah, you never know where it will help
<hkaiser> jbjnr_: sorry again for being slow on this
<jbjnr_> I'll have a play with it later - off for a swim now
<jbjnr_> bbiab
<heller_> jbjnr_: yes, we really do!
EverYoun_ has joined #ste||ar
<heller_> hkaiser: where are you?
<hkaiser> Los Alamos
EverYoung has quit [Ping timeout: 240 seconds]
<heller_> Ahh, good luck!
<hkaiser> thanks
<heller_> I pinned down the commit where things went wrong
<heller_> But I'm not sure what the hell goes wrong there
vernon has joined #ste||ar
<hkaiser> heller_: yah, saw your comment - thanks for investing time to fix it
hkaiser has quit [Quit: bye]
david_pf_ has joined #ste||ar
aserio has quit [Ping timeout: 240 seconds]
dmarce21 has quit [Ping timeout: 268 seconds]
dmarce2 has joined #ste||ar
dmarce2 has left #ste||ar [#ste||ar]
pree_ has joined #ste||ar
pree has quit [Ping timeout: 255 seconds]
<ABresting> heller_: yt?
<heller_> ABresting: what's up?
<ABresting> I was wondering if you could help me with some HPX internals, as wash and hkaiser are away?
bikineev has joined #ste||ar
<heller_> ABresting: which ones specifically?
<ABresting> about threads, what happens when one thread affected by segmentation fault ? it crashes ?
<ABresting>
<heller_> Yes
<ABresting> and it restarts again ?
<diehlpk_work> wash[m], yt?
<diehlpk_work> Is the student how is doing HPX- Stack overflow detection in Linux around?
EverYoun_ has quit [Ping timeout: 246 seconds]
<ABresting> yes yes
<ABresting> diehlpk_work: yes
<ABresting> I have like a gazillion question
<zao> How much wood would a woodchuck chuck if a woodchuck could chuck wood?
vernon has quit [Remote host closed the connection]
jbjnr has joined #ste||ar
EverYoung has joined #ste||ar
EverYoun_ has joined #ste||ar
EverYoun_ has quit [Remote host closed the connection]
EverYoun_ has joined #ste||ar
EverYoung has quit [Ping timeout: 240 seconds]
pree_ has quit [Ping timeout: 255 seconds]
bikineev has quit [Ping timeout: 255 seconds]
jbjnr_ has quit [Quit: Konversation terminated!]
bikineev has joined #ste||ar
aserio has joined #ste||ar
EverYoun_ has quit [Ping timeout: 240 seconds]
aserio has quit [Ping timeout: 260 seconds]
vamatya has joined #ste||ar
ajaivgeorge has joined #ste||ar
ajaivgeorge has quit [Ping timeout: 240 seconds]
ajaivgeorge has joined #ste||ar
ajaivgeorge_ has joined #ste||ar
ajaivgeorge has quit [Ping timeout: 260 seconds]
bikineev has quit [Ping timeout: 260 seconds]
pree has joined #ste||ar
jaafar has joined #ste||ar
david_pf_ has quit [Ping timeout: 260 seconds]
pree_ has joined #ste||ar
pree has quit [Read error: Connection reset by peer]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
eschnett has quit [Ping timeout: 240 seconds]
eschnett has joined #ste||ar
david_pf_ has joined #ste||ar
<heller_> ABresting: how should it restart?
<ABresting> that you tell me. what if grain size is smaller for that ?
<ABresting> allocate larger stack and re run ?
<heller_> What about side effects? Mutating state?
<heller_> How do you roll back what you did before?
<ABresting> journaling ?
<ABresting> process level
<ABresting> but for this we need to know if it was indeed a stack overflow in the first place, as segmentation fault can't be handled by increased stack size
<heller_> No.
<heller_> You can't just roll back that easily on failure
<heller_> Where did you get that idea from?
<ABresting> thats how apache zookeeper handles data as it creates a live journal, in case of failure it replays it.
<ABresting> but the problem is performance
<ABresting> there this solution is acceptable as subject is more important than task deadline
<ABresting> but I think the goal to distinguish between stackoverflow and segmentation fault was to handle the stack overflow?
<heller_> The first step would be to detect stack overflows
<ABresting> tell me if I am missing something
<ABresting> and after that ?
denis_blank has joined #ste||ar
<heller_> We can't keep arbitrary live journals
<ABresting> hence the performance!
<heller_> We are executing any c++ functions
<heller_> Nothing to do with performance
<ABresting> ok lets keep jornals aside.
<ABresting> but what to do after detecting stack overflow ?
<ABresting> what to do with that information ?
<heller_> I haven't done any research in that direction, to be honest
<heller_> One thing is that we can give a proper error message
<heller_> And tell the user to increase the stack space
<heller_> Given the function/action name
<ABresting> increasing the stack space for that particular function?
<heller_> Yes, on the user side
<heller_> In the application code
<heller_> So, are you able to detect stack overflows in a generic way?
<heller_> That is even when no segmentation fault occurs?
<ABresting> yes, but the technique is drived from libsigsegv
<ABresting> underlying signal is the same,
<ABresting> as we can't do anything with kernel signaling mechanism
<ABresting> but once signal SIGSEGV occurred, handle it and get to the address and trace the last address and boom!
<heller_> So, integrate this into hpx and prepare a pr that gives you the diagnostic?
<heller_> Sure
<ABresting> if its stack overflow just report that, else segmentfault
<ABresting> but hkaiser has a genuine concern i.e. as the solution is derived from libsigsegv then don't know how it will go with the boost license
<ABresting> there aren't enough efficient ways to detect stack overflow
<ABresting> the most neat one was implemented by libsigsegv
<ABresting> is anyone in the group has a law degree?
<heller_> so what's wrong with just using libsigsev?
<heller_> no, we are not lawyers
<heller_> implement it as a proof of concept
aserio has joined #ste||ar
<ABresting> libsigsegv carries GPL licensing with it, which violates HPX's boost license.
<ABresting> so my point is it even worth going through secondary system syndrome if we are not going to integrate the code with HPX and use is as a third party dependency?
<ABresting> that too on user side
aserio has quit [Ping timeout: 260 seconds]
<heller_> ABresting: if you prefer to implement something like libsigsev instead, go knock yourself out
<ABresting> that "something like" is where hakiser and wash are stuck :(
<ABresting> but I am gonna do it anyways :P
<heller_> what does it mean that they are "stuck"?
<ABresting> well they are having second thoughts if we can use libsigsegv in its original form
<ABresting> as an optional dependency
<heller_> what is the alternative?
<heller_> jbjnr: meggie has omnipath, no infiniband, so LF parcelport it is again ;)
<heller_> sorry...
<ABresting> that we have to decide, see I can develop things but can't decide what goes in HPX repo :/
<ABresting> I am having a call with wash tomorrow. Let's see how it turns out.
<ABresting> you wanna join ?
EverYoung has quit [Ping timeout: 240 seconds]
<ABresting> heller_: does submodules in a repo also creates license issue?
<heller_> for sure
<ABresting> :P
<heller_> even code that just links against a GPL licensed code needs to be put under the GPL
<heller_> even if you don't distributed the GPL code
<ABresting> what if we include it in requirments.txt like in any python based library?
<ABresting> then too it's the same?
<heller_> i guess so
<heller_> but as said, that shouldn't be something you need to care about ... a proof of concept would be nice already
<heller_> and then, once that's there, we can look into what to do next
ajaivgeorge_ has quit [Remote host closed the connection]
<ABresting> that's pragmatic! ok, for now, I should develop the changes in my HPX repo and once ready I will share it with the group I think.
<heller_> yes, otherwise you'll end up with nothing
<ABresting> great advice, Thanks heller_ :)
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
akheir has quit [Remote host closed the connection]
bikineev has joined #ste||ar
EverYoung has quit [Ping timeout: 240 seconds]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
bikineev has quit [Read error: No route to host]
bikineev has joined #ste||ar
EverYoun_ has joined #ste||ar
EverYoung has quit [Ping timeout: 246 seconds]
bikineev has quit [Remote host closed the connection]
EverYoun_ has quit [Remote host closed the connection]
pree_ has quit [Quit: AaBbCc]
bikineev has joined #ste||ar
david_pf_ has quit [Quit: david_pf_]
jaafar has quit [Ping timeout: 246 seconds]
bikineev has quit [Remote host closed the connection]
bikineev has joined #ste||ar
bikineev has quit [Ping timeout: 260 seconds]
eschnett has quit [Ping timeout: 240 seconds]
eschnett has joined #ste||ar
bikineev has joined #ste||ar
EverYoung has joined #ste||ar
Matombo has quit [Remote host closed the connection]
eschnett has quit [Quit: eschnett]