#ste||ar on 2020-03-18 — irc logs at irclog.cct.lsu.edu

2020-02-24 20:46 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020

01:05 diehlpk has joined #ste||ar

01:05 diehlpk has quit [Changing host]

01:05 diehlpk has joined #ste||ar

01:29 rtohid has left #ste||ar [#ste||ar]

02:03 hkaiser has quit [Quit: bye]

02:15 weilewei has quit [Remote host closed the connection]

02:47 diehlpk has quit [Ping timeout: 246 seconds]

02:58 bita has quit [Read error: Connection reset by peer]

02:59 bita has joined #ste||ar

06:13 jaafar has quit [Ping timeout: 268 seconds]

07:38 nikunj97 has joined #ste||ar

07:45 <simbergm> jbjnr: this is an automated reminder to look at https://github.com/STEllAR-GROUP/governance/pull/2

07:56 <kordejong> Hi all. I have a simple benchmark that multiplies and divides partitioned arrays for a number of iterations (single node, 4 cores). When I look at a plot of the thread idle rates I see low values (<10%) with spikes (40, 60, even 100%) at regular intervals. A trace loaded in Vampir suggests the spikes in idle-rate correspond with calls to `hpx::agas::server::primary_namespace::decrement_credit_action`. At those moments, the

07:56 <kordejong> other three cores show much less activity, while around those calls, all cores are busy. Is this behavior to be expected or can I maybe change/tweak something to prevent these high idle rates?

07:56 <heller1> interesting

07:57 <heller1> this is due to partitioned vector is using global IDs

07:58 <heller1> however, no credit splitting should happen in a non distributed setting. nevertheless, once the id_type instances go out of scope, there's at least one decrement happening indeed

07:59 <heller1> those regular intervals then should correlate to the end of your iterations, where you let the partitioned vectors go out of scope.

08:03 <kordejong> I am not using the built-in partitioned vector BTW, but my own nD partitioned array. But possibly what you say also holds for my case.

08:05 <heller1> ah ok

08:05 <heller1> then, I have no idea what you are doing ;)

08:12 <jbjnr> Kor de Jong: can you upload your plot somewhere - or vampir data - I could have a look. Doubt I can help, but might be able to spot something unusual

08:17 <kordejong> I have put a zip of the trace here: https://kordejong.stackstorage.com/s/zdNxQlxDQaRW0VW Thanks in advance!

08:34 <kordejong> Some additional info: the number of iterations is 50, while I count about 20 spikes in idle-rate / calls to `decrement_credit_action`. Also, my array partitions are HPX components, which are referenced by partitioned array instances. In my current case the partitions are all located on a single desktop machine, but on a cluster they get distributed over all available NUMA nodes.

08:59 gonidelis has joined #ste||ar

09:02 <gonidelis> Where could I find out how do the tests work? (in the case of all_off.cpp for example) What is their purpose, when and how are they executed?

09:10 <heller1> gonidelis: their purpose is to give us some way of telling whether the implementation works or if we broke anything

09:10 <heller1> they can be run manually, but are also run at every pull request and merge to master

09:14 nikunj97 has quit [Remote host closed the connection]

09:14 nikunj97 has joined #ste||ar

09:25 * jbjnr uploaded an image: image.png (165KB) < https://matrix.org/_matrix/media/r0/download/matrix.org/tyQUTgKmAXEVdXhtCfcceXCS >

09:25 <jbjnr> Kor de Jong: interesting. I see the pattern you refer to

09:26 <jbjnr> is this 4 threads on 1 node, or 4 nodes?

09:26 <kordejong> 4 threads on 1 node

09:26 <jbjnr> I do not use actions much, so I'm not sure what might be going on, but it's be interesting to see the code as well if it's on github or anything.

09:27 <jbjnr> ok. Actions are most useful for remote invocations. if you're doing things node local, maybe you could just call the functions directly and bypass the action stuff?

09:27 <jbjnr> (unless you're devloping this locally with the plan to go remote later, in which case ignore my comment)

09:29 <jbjnr> The problem with actions is that they go through the network layer, which adds some overheads. I suspect you're seeing some bottleneck where the network temporarily stops processing stuff. Hard to say without hands on. (I'm not volunteering).

09:29 <kordejong> 😉 I would like this to work well in the distributed case, but also in the non-distributed case. With a single implementation.

09:29 <jbjnr> ok

09:30 <jbjnr> you're using the mpi backend to networking I assume.

09:30 <jbjnr> do the multiple/dives that are delayed 'wait' on results from another core?

09:31 <jbjnr> * do the multiply/divides that are delayed 'wait' on results from another core?

09:31 <heller1> well, reference counting always implies some overhead

09:31 <heller1> the question however is, is the decrease in parallelism only happening in non critical sections of your code?

09:31 <kordejong> When building on my local Desktop I turn networking off

09:32 <kordejong> On the cluster I build HPX with `HPX_WITH_PARCELPORT_MPI=ON`

09:37 <jbjnr> The credit thing in the plot above takes a very long time. This is disturbing and is no doubt highly indicative of some underlying issue.

09:38 <heller1> sure it is

09:38 <heller1> nevertheless, I hope this is not a debug build?

09:39 <kordejong> There is no waiting in the implementation, except after the iteration has finished. My goal is to make this as asynchronous as possible, and the Vampir plots suggest that things are going well most of the time. Calculations of a next iteration start for some partitions while calculations of a previous iteration are still ongoing for other partitions. One aspect of the implementation is that calculations on partitioned

09:39 <kordejong> arrays always return new partitioned arrays. Non of the array partitions are updated.

09:39 <heller1> jbjnr: FWIW, I only identified the credit counting to be an issue on large scale runs, when a lot of distributed credit counting is happening

09:39 <kordejong> DebInfoWithRelease build

09:39 <heller1> ReleaseWithDebInfo

09:40 <kordejong> Ah yes ;-)

09:40 <heller1> so the partitioned arrays are temporary objects most of the time?

09:40 <jbjnr> Sorry. I didn't mean waiting in th literal sense, but more like `when(remote).then(multiply)` - implying that the multiply only happens when some remote calculation has completed.

09:41 <jbjnr> if those remote things are being delayed ...

09:41 <heller1> yeah

09:41 <kordejong> <heller1 "so the partitioned arrays are te"> indeed

09:41 <heller1> the decref actions are only supposed to be executing when an id_type goes out of scope

09:42 <heller1> Kor de Jong: so ask yourself when you need them to be distribtued, and if you can't separate the actual data from the component

09:43 <jbjnr> ^this

09:43 <jbjnr> gtg

09:49 gonidelis has quit [Remote host closed the connection]

10:01 <kordejong> <heller1 "Kor de Jong: so ask yourself whe"> I have trouble understanding your point, but it seems important. A bit more clarification from my part: The array partition component servers are located on different localities and stay there, until they go out of scope. The corresponding client instances are used in the implementation of the partitioned array class and are located on the root locality. When performing a

10:01 <kordejong> calculation `output_array = f(input_array)` tasks implementing the calculation for individual partitions are being sent to their respective localities, resulting in output partitions on these same localities. Partitioned arrays contain array partitions that may or may not be ready. Algorithms attach continuations to input partitions and result in output partitions. Does this make sense?

10:06 <kordejong> Maybe this approach is not ideal in the case of a single locality

10:12 <heller1> yes, makes sense

10:12 <heller1> think about it, if you can implement `output_array = f(input_array)` without components

10:24 hkaiser has joined #ste||ar

10:31 nikunj97 has quit [Quit: Leaving]

10:46 Abhishek09 has joined #ste||ar

11:03 Abhishek09 has quit [Remote host closed the connection]

11:41 diehlpk has joined #ste||ar

11:41 diehlpk has quit [Changing host]

11:41 diehlpk has joined #ste||ar

11:44 mdiers_ has quit [Ping timeout: 264 seconds]

11:45 Pranavug has joined #ste||ar

11:45 <hkaiser> simbergm: g'morning

11:46 <hkaiser> simbergm: quick q: how do I supress a clang-tidy error?

11:46 <simbergm> hkaiser: sec, I'll give you an example

11:47 Vir has quit [Ping timeout: 256 seconds]

11:47 <simbergm> hkaiser: // NOLINTNEXTLINE(bugprone-branch-clone)

11:47 <simbergm> replace `bugprone-branch-clone` with the actual warning

11:47 Pranavug has quit [Client Quit]

11:47 <simbergm> I think `// NOLINT(blabla)` on the same line works as well

11:48 <simbergm> https://github.com/STEllAR-GROUP/hpx/blob/7bdece7fb5a7927f4f6fc44c35a0820e37fc5c19/src/util/regex_from_pattern.cpp#L91

11:48 <hkaiser> thanks! that's exactly what I need

11:50 diehlpk has quit [Ping timeout: 246 seconds]

11:51 mdiers_ has joined #ste||ar

11:51 Vir has joined #ste||ar

11:51 Vir has quit [Changing host]

11:51 Vir has joined #ste||ar

11:59 weilewei has joined #ste||ar

13:44 kale_ has joined #ste||ar

13:44 Pranavug has joined #ste||ar

13:45 Pranavug has quit [Quit: Leaving]

13:46 nikunj97 has joined #ste||ar

13:46 <weilewei> diehlpk_mobile[m incite proposal writing webinar: https://www.olcf.ornl.gov/calendar/2021-incite-call-for-proposals-webinar-2/

14:00 diehlpk_work has joined #ste||ar

14:15 <kale_> Hey, I'm Mahesh Kale. I'm a junior in computer science at IIT Roorkee. I'm currently learning the ways to make pip package of projects with binaries. I came across the project of making a pip package for phylanx. I've already built phylanx and all its dependencies on my machine. I think the project will be a great learning experience for me. Can you guide me further on how I can proceed with the project?

14:15 <zao> Hi there!

14:16 rtohid has joined #ste||ar

14:17 <nikunj97> kale_, you may want to talk further about the project with rtohid or diehlpk_work

14:18 <diehlpk_work> kale_, https://github.com/STEllAR-GROUP/hpx/wiki/Hints-for-Successful-Proposals

14:19 <diehlpk_work> https://github.com/STEllAR-GROUP/hpx/wiki/GSoC-Submission-Template

14:19 <diehlpk_work> At the end you have to prepare a proposal by the end of this month

14:20 <diehlpk_work> And outline how you would solve the project

14:20 <diehlpk_work> The main challenge here is that you have to pack all dependencies within the pip package

14:21 <diehlpk_work> Also some one mentioned that https://docs.conda.io/en/latest/ could be an alternative

14:27 <kale_> diehlpk_work, People who are into ML field use conda more often than pip. And making a conda package would be easier and more efficient than pip package.

14:29 * zao jiggles

14:29 <kale_> diehlpk_work, I'll take a look at how conda packages differ from pip packages so I can get a better understanding of which one would be better.

14:29 <heller1> he

14:30 <diehlpk_work> Hashmi, sounds good

14:33 karame78 has joined #ste||ar

14:33 shahrzad has joined #ste||ar

14:35 <kale_> diehlpk_work, I think there can be package in both conda and pip so that end user gets a choice while installing

14:35 <diehlpk_work> kale_, A good first step would be to compare the tools and send us which tool is betetr for what we want to do

14:37 <nikunj97> kale_, btw conda is not available on clusters, at least I'm yet to see a module available

14:37 <nikunj97> so a pip package looks necessary in our case

14:37 K-ballo has quit [Remote host closed the connection]

14:38 K-ballo has joined #ste||ar

14:39 <kale_> diehlpk_work, Thanks for the lead, I'll research for it.

14:40 <nikunj97> zao, how's the coronavirus situation at your place btw?

14:41 <kale_> nikunj97, I'll research further on pros and cons of pip package and possibility of conda package on clusters.

14:41 <nikunj97> kale_, sounds good

14:44 <zao> nikunj97: Most of the HPC site are working-from-home, all higher education at the uni is moved online, buildings are closed for students. Country in general is still reasonably lax.

14:45 <nikunj97> zao, I see. Even we have our universitites closed these days. Minimal outings, trying our best to not spread it further

14:57 nan222 has joined #ste||ar

15:08 RoryH has joined #ste||ar

15:27 Abhishek09 has joined #ste||ar

15:28 <Abhishek09> rtohid: very happy to see u after long time

15:29 diehlpk has joined #ste||ar

15:29 diehlpk has quit [Changing host]

15:29 diehlpk has joined #ste||ar

15:32 <Abhishek09> diehlpk Is rtohid here?

15:32 shahrzad has quit [Read error: Connection reset by peer]

15:32 shahrzad_ has joined #ste||ar

15:32 <nikunj97> Abhishek09, they have a meeting right now

15:33 <Abhishek09> nikunj97: When they will free?

15:33 <nikunj97> usually goes for like 40min. Don't know how long it'll take today

15:34 <Abhishek09> nikunj97: Are u mentor this year?

15:34 <nikunj97> Abhishek09, yes

15:35 <Abhishek09> Which project?

15:35 <nikunj97> Blaze iterative, concurrent data structures, pip package

15:36 <nikunj97> mainly mentoring for Blaze iterative and concurrent data structures though

15:37 <nikunj97> for pip package, rtohid is handling most of the things. I'm here to help with general problems that arises with HPX and Phylanx

15:37 <Abhishek09> Which is more beneficicial in GSoC , as a student or mentor?

15:38 <Abhishek09> nikunj97

15:38 <nikunj97> This is the first time I'm being a mentor. I don't know the perks of mentors.

15:38 <Abhishek09> myself also once a gsocer

15:39 <nikunj97> I already had an internship and didn't want to work twice as hard, so I opted for mentorship instead

15:39 <Abhishek09> i didn't find ste||ar on linkedIN

15:40 <nikunj97> STE||AR is a name we gave ourselves as an organization. CCT didn't raise an eyebrow, so we're using it officially as an open source org.

15:41 <nikunj97> that's why you don't find it on linkedin

15:41 <nikunj97> most likely, no one created it there

15:41 <nikunj97> zao, would you like a linkedin profile for STE||AR?

15:41 <Abhishek09> nikunj97: that means you are in 2nd or 3 rd year

15:41 <nikunj97> simbergm, jbjnr your thoughts too on this ^^

15:42 shahrzad_ has quit [Ping timeout: 250 seconds]

15:42 <nikunj97> Abhishek09, yes, I'm in my 3rd year. I did my gsoc back in my 1st year with STE||AR when I used to have a lot of spare time

15:43 <Abhishek09> nikunj97: Is this org needs contribution for selection?

15:44 <Abhishek09> or proposal is enough

15:44 <nikunj97> Abhishek09, contributions are way to assess your understanding. So I believe, you can prove your case if you have contributions

15:45 <nikunj97> contributions can be as simple as a documentation update or as complex as fixing a difficult to handle bug

15:46 <nikunj97> but it shows your interaction with the community and your understanding of the library

15:46 <nikunj97> both of which are crucial for a great proposal

15:47 <zao> The GSoC wiki page suggests having something to show the understanding, even if it's a toy futurised matrix-matrix multiply or so.

15:48 <zao> nikunj97: I don't use linkedin actively at all, but if there's some sort of affiliation or liking one can do of ste||ar as an organization, I'd dop that.

15:49 <Abhishek09> I also thought to mentor this year but i dropped that idea

15:49 <nikunj97> zao, well yea. for example, I use STE||AR GROUP as an experience on my linkedin. But I have to manually add the name and there's nothing to search. So if we had a page, that'll be a great idea

15:49 <nikunj97> Abhishek09, your reason being?

15:50 <nikunj97> zao, that's why I use LSU on the internship I did last year. It made more sense that way.

15:51 <Abhishek09> Student participation will make profile more impressive rather than mentor

15:51 <Abhishek09> i thouhgt so

15:51 <nikunj97> I see. I wanted to give back to the community. Hence, I became a mentor.

15:52 <nikunj97> besides another reason being lack of time for another gsoc project completion

15:53 <zao> Mentorship is kind of indicative of ones capability to lead and collaborate as well.

15:53 <Abhishek09> nikunj97: You can also do gsoc along with internship

15:53 <jbjnr> Abhishek09: did you say you have done a project with stellar before? if so, which one please? Thanks

15:54 <nikunj97> Abhishek09, ik. I don't want to overburden myself with another gsoc project, when I already have a lot of things going on ;)

15:54 <Abhishek09> one senior harkirat singh has done(iit roorkee alumini)

15:55 <Abhishek09> jbjnr: not with ste||ar

15:55 <nikunj97> Abhishek09, ik people who have done it as well. Again, I don't want to overburden myself

15:55 <nikunj97> btw you're from iitr?

15:55 <Abhishek09> No

15:56 <Abhishek09> you?

15:56 <nikunj97> yea, I'm from iitr

15:56 <Abhishek09> u known harkirat?

15:56 <nikunj97> yup, he was in his final year when I was a freshman

15:57 <jbjnr> Abhishek09: which project was it?

15:57 <Abhishek09> Aboutcode organisation

16:00 <Abhishek09> jbjnr: https://www.linkedin.com/in/abhishek-kashyap-09/

16:01 <jbjnr> ok. I had a gsoc student on a differnt project a few years ago called Abhishek - but I guess it is a common name in India (?)

16:01 <nikunj97> jbjnr, haha yea!

16:01 <nikunj97> it's a pretty common name to have. Even mine is a common one.

16:01 <Abhishek09> Yes , jbjnr

16:01 <nikunj97> I know at least 5 other nikunj in my friend circle alone

16:02 <Abhishek09> jbjnr: You can see my project details there

16:06 <Abhishek09> jbjnr: Have seen all details?

16:07 <jbjnr> I'm not on linked in so I can't see anything. Do not worry, I didn't want to read it :)

16:07 mdiers_ has quit [Remote host closed the connection]

16:08 <nikunj97> jbjnr, btw what's your take on having STE||AR as an org on linkedin?

16:08 <Abhishek09> jbjnr :)

16:08 <simbergm> hkaiser: yt? just for tomorrow's meetings, can you host it? I think we'll be limited to 45 (or 30) minutes if I host it

16:09 <simbergm> and if yes, can you just send an email to hpx-devel with the details?

16:09 mdiers_ has joined #ste||ar

16:12 <hkaiser> simbergm: sure, I'll create a zoom meeting

16:12 <hkaiser> simbergm: who should be on ?

16:13 <Abhishek09> hkaiser: Is rtohid is free now?

16:13 <hkaiser> Abhishek09: he's in a meeting right now, I'll tell him to get on here

16:14 <hkaiser> ohh, he is on

16:14 <nikunj97> hkaiser, about the kokkos integration. Do we have a working hpx backend for kokkos now?

16:14 <Abhishek09> hkaiser: Thanks but not no reply from him

16:15 <hkaiser> Abhishek09: give him a minute or two

16:15 <hkaiser> nikunj97: simbergm is the local expert for this

16:15 <nikunj97> hkaiser, ohh, alright.

16:16 kale_ has quit [Quit: Leaving]

16:20 diehlpk has quit [Ping timeout: 246 seconds]

16:27 Abhishek09 has quit [Quit: Ping timeout (120 seconds)]

16:34 RoryH has quit [Ping timeout: 240 seconds]

16:38 Abhishek09 has joined #ste||ar

16:40 Abhishek09 has quit [Client Quit]

16:43 <diehlpk_work> hkaiser, Got the approval and submitted the expense report

17:01 jaafar has joined #ste||ar

17:10 <hkaiser> diehlpk_work: ok, will approve right away

17:13 nikunj97 has quit [Ping timeout: 256 seconds]

17:23 nikunj97 has joined #ste||ar

17:31 <simbergm> nikunj97: yeah

17:32 <nikunj97> simbergm, it's only on node parallelism, right?

17:32 <simbergm> Feel free to join the meeting tomorrow if you're interested

17:32 <simbergm> Yep

17:32 <nikunj97> when's it?

17:32 <nikunj97> I'd love to join in too

17:32 <simbergm> 3 pm cet

17:33 <nikunj97> cet is GMT+1 right?

17:33 <simbergm> hkaiser: I think everyone interested in the kokkos meeting is here so you can put the details here as well?

17:33 <simbergm> Yep

17:33 <nikunj97> alright, I'll join too!

17:34 <weilewei> Oh, there is a Kokkos meeting, can I join as well?

17:35 <hkaiser> simbergm: https://lsu.zoom.us/j/3340410194, tomorrow 9amCDT/15:00CET

17:36 <hkaiser> weilewei: sure, see link above

17:36 <weilewei> hkaiser thanks, will be there.

17:37 <hkaiser> nikunj97: I believe its GMT+2

17:37 <nikunj97> hkaiser, ohh ok. So 6:30PM IST

17:37 <hkaiser> no, you're right, gmt+1

17:38 <hkaiser> https://greenwichmeantime.com/time/to/gmt-cet/

17:38 <nikunj97> alright. 7:30PM it is

17:47 simbergm has left #ste||ar ["User left"]

18:05 akheir has joined #ste||ar

18:05 shahrzad_ has joined #ste||ar

18:11 shahrzad_ has quit [Ping timeout: 246 seconds]

18:59 ahkeir1 has joined #ste||ar

19:00 akheir has quit [Ping timeout: 256 seconds]

19:09 RoryH has joined #ste||ar

19:17 rtohid has quit [Remote host closed the connection]

19:17 RoryH has quit [Remote host closed the connection]

19:18 shahrzad_ has joined #ste||ar

19:23 RoryH has joined #ste||ar

19:34 nan222 has quit [Ping timeout: 240 seconds]

19:55 ahkeir1 has quit [Read error: Connection reset by peer]

19:56 ahkeir1 has joined #ste||ar

20:03 shahrzad_ has quit [Ping timeout: 246 seconds]

20:03 RoryH has quit [Remote host closed the connection]

20:04 weilewei has quit [Remote host closed the connection]

20:05 gonidelis has joined #ste||ar

20:35 rtohid has joined #ste||ar

21:01 <diehlpk_work> hkaiser, Meeting?

21:02 <hkaiser> diehlpk_work: sec

21:15 shahrzad_ has joined #ste||ar

21:24 shahrzad_ has quit [Read error: Connection reset by peer]

21:27 shahrzad has joined #ste||ar

21:34 shahrzad has quit [Ping timeout: 246 seconds]

21:36 shahrzad has joined #ste||ar

21:45 shahrzad has quit [Ping timeout: 246 seconds]

21:45 rtohid has left #ste||ar [#ste||ar]

22:16 <bita> Do we have something like: HPX_TEST_NOT_EQ

22:55 <hkaiser> bita: HPX_TEST_NEQ, I believe

22:55 <bita> Great, thanks

22:56 <hkaiser> https://github.com/STEllAR-GROUP/hpx/blob/master/libs/testing/include/hpx/testing.hpp#L201

22:56 <hkaiser> bita: ^^

22:56 <bita> I usually have a hard time finding macros and their meaning

22:57 <hkaiser> it's... nicely underdocumented

22:58 gonidelis has quit [Ping timeout: 240 seconds]

23:33 folshost has joined #ste||ar

23:33 bita_ has joined #ste||ar

23:35 diehlpk_work_ has joined #ste||ar

23:36 bita has quit [Ping timeout: 246 seconds]

23:36 maxwellr96 has quit [Ping timeout: 246 seconds]

23:36 diehlpk_work has quit [Ping timeout: 256 seconds]

23:37 nikunj97 has quit [Ping timeout: 246 seconds]

23:48 diehlpk_work_ has quit [Remote host closed the connection]

23:49 K-ballo has quit [Quit: K-ballo]

23:49 ahkeir1 has quit [Read error: Connection reset by peer]

23:50 K-ballo has joined #ste||ar

23:50 ahkeir1 has joined #ste||ar