#ste||ar on 2021-05-10 — irc logs at irclog.cct.lsu.edu

2020-09-17 16:16 K-ballo changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/

01:24 diehlpk_work has joined #ste||ar

01:50 K-ballo has quit [Quit: K-ballo]

02:11 diehlpk_work has quit [Remote host closed the connection]

02:21 hkaiser has quit [Quit: bye]

06:20 hkaiser has joined #ste||ar

06:24 <rachitt_shah[m]> Hey ms I've filled in the form for GSoD, please let me know if you would like to interview me/any other step forward for GSoD.

06:24 <rachitt_shah[m]> Thank you, and looking forward to work with you folks for creating some amazing docs!

07:17 <ms[m]> srinivasyadav227, hkaiser ci did run, it just didn't show up because there were newer commits after the retest

07:18 <hkaiser> ms[m]: I added srinivasyadav227 to the repo, shouldn't the test run without a retest now?

07:18 <ms[m]> I've added you to the whitelist now

07:18 <ms[m]> probably not if you added him after the latest commits

07:19 <ms[m]> they should run whenever he commits though

07:19 <hkaiser> ms[m]: is that a jenkins whitelist?

07:19 <ms[m]> yeah

07:19 <hkaiser> ahh, could you add jedi18[m] as well, pls?

07:19 <ms[m]> I added him manually, but you can also comment with "add to whitelist"

07:19 <ms[m]> already done

07:19 <hkaiser> thanks

07:20 <ms[m]> tests should run now automatically, but I wouldn't be surprised if there are kinks, so let me know they still don't run

07:21 <srinivasyadav227> okay, sure ;-) thanks!

07:21 <ms[m]> rachitt_shah: thanks! we'll let you know soon about next steps

07:21 <rachitt_shah[m]> Thank you ms , looking forward to them!

08:17 hkaiser has quit [Quit: bye]

08:24 <ms[m]> hkaiser: sleepless night? :/

11:27 K-ballo has joined #ste||ar

12:00 <srinivasyadav227> ms: the tests here https://github.com/STEllAR-GROUP/hpx/pull/5319 got stuck like same as previous `retest`, all jenkins/cscs checks are running but jenkins/lsu tests are not starting

12:07 <itn[m]> do this error "Valid buildid required" means the same ?

12:08 <srinivasyadav227> itn: yes, i was reffering to "Valid buildid required"

12:09 itn[m] is now known as rainmaker6[m]

12:13 <rainmaker6[m]> yeah same was the case with me. I thought there was some error in config files which are needed to be fixed so i ignored them.

12:14 hkaiser has joined #ste||ar

12:16 <rainmaker6[m]> <srinivasyadav227 "ms: the tests here https://githu"> But it's clear for me now.

12:18 <srinivasyadav227> rainmaker6: okay, did they get automatically cleared?

12:20 <rainmaker6[m]> cleared in the sense ? executed?

12:20 <srinivasyadav227> i mean, have you applied any changes or did something to solve that?

12:21 <rainmaker6[m]> no

12:22 <srinivasyadav227> okay :)

12:24 <rainmaker6[m]> basically they sometimes gets triggered and sometimes don't.

12:24 <srinivasyadav227> rainmaker6: oh ok alright then, cool, np ;)

12:30 joe88 has joined #ste||ar

12:31 joe88 has quit [Client Quit]

12:34 Girlwithandroid[ has joined #ste||ar

12:34 <Girlwithandroid[> Hi! Is the deadline to apply for GSOD over?

12:35 <Girlwithandroid[> Can someone help with linking the relevant docs/where to submit the proposal?

12:38 <srinivasyadav227> Girl with android: i think this may help you, https://stellar-group.org/2021/04/hpx-accepted-for-google-season-of-docs-2021/ (idk much abt gsod ;) but i got this from previous discussion here)

12:38 <Girlwithandroid[> Thank you! This helps :)

12:39 <srinivasyadav227> :)

12:42 <ms[m]> Girl with android: the link is in that post as well, but here's a direct link to the application form

12:42 <ms[m]> https://docs.google.com/forms/d/e/1FAIpQLSdHEu6RsSolWCX98aETEMB1otjasSmY_A29KO9XCyt27WFUKQ/viewform

12:42 <Girlwithandroid[> thank you so much!

12:42 <ms[m]> srinivasyadav227, rainmaker6, the lsu ci is currently "out of order"

12:43 <ms[m]> the guy responsible for the cluster there has been out for some time, but we're hoping to get them back to normal soon

12:43 <ms[m]> "invalid build id" usually means "something unrelated to the actual build and tests went wrong"

12:46 <srinivasyadav227> ms: okay :)

12:52 <hkaiser> ms[m]: Alireza has had a difficult tooth surgery last week....

12:53 <hkaiser> sorry for the problems

12:55 <ms[m]> hkaiser: no worries! was just wondering about the status, it takes as long as it takes for him to recover :) please tell him not to stress if I'm making it sound like he should stress about it ;9

13:06 <ms[m]> hkaiser: ok, just had another look, looks like jobs are running again (the last few days no lsu jobs were running successfully, so I thought it was something else going on)

13:07 <ms[m]> the gcc 8 configurations would need some boost modules installed

13:17 <rainmaker6[m]> ms noted :)

13:17 <rainmaker6[m]> <hkaiser "ms: Alireza has had a difficult "> wishes for an early recovery

13:24 <hkaiser> rainmaker6[m]: thanks

13:28 hkaiser has quit [Quit: bye]

14:11 nanmiao has joined #ste||ar

14:12 <zao> HPX is too hard if you break your teeth biting into it ;)

14:21 <gonidelis[m]> can't believe that google mismatches us with this stupid tape

14:26 <gonidelis[m]> ms[m]: is this the proper place for the iter_sent header? https://github.com/STEllAR-GROUP/hpx/tree/master/libs/core/iterator_support/include/hpx/iterator_support

14:26 <gonidelis[m]> or should i put it in the traits folder?

14:27 <gonidelis[m]> the iter_sent header

14:27 <gonidelis[m]> https://github.com/STEllAR-GROUP/hpx/blob/34405c51f17dca3165ef03c6e608b554bd8aeec5/libs/parallelism/algorithms/tests/util/iter_sent.hpp

14:32 K-ballo has quit [Ping timeout: 252 seconds]

14:43 K-ballo has joined #ste||ar

15:03 <ms[m]> gonidelis[m]: that was only for tests right? if yes, put it in e.g libs/core/iterator_support/tests/include/hpx/iterator_support/test or similar

15:04 <gonidelis[m]> oh ok

15:04 <gonidelis[m]> ms[m]: and then how do i include it in the source?

15:05 <ms[m]> with an interface target which has a target_include_directories or just a target_include_directories on the test directly (the former is preferable)

15:06 <ms[m]> not sure what to call it though, hpx_iterator_support_test is not great, but something in that direction

15:07 <gonidelis[m]> the thing is that i will go with a similar solution for the `libs/parallelism` tests

15:07 <gonidelis[m]> `libs/parallelism/algorithms`

15:07 <gonidelis[m]> ^^

15:07 <gonidelis[m]> that's a different module, so that means a different include

15:07 <ms[m]> hence the target, you can link to the target in the algorithms tests as well

15:08 <gonidelis[m]> but don't we want for the modules to be compilable autonomously ?

15:08 <gonidelis[m]> if i create a target in iterator_support the algorithms module which will be using the iter_sent.hpp will be dependent on the iterator_support one

15:09 <ms[m]> yep, that's ok

15:10 <gonidelis[m]> hm ok

15:10 <gonidelis[m]> are we sure that we wanna add an include directory here ?

15:10 <gonidelis[m]> https://github.com/STEllAR-GROUP/hpx/tree/master/libs/core/iterator_support/tests

15:11 <ms[m]> where would you like to add it?

15:13 <gonidelis[m]> it's not that i have a better suggestion, it's that we create an include dir 3-4 levels deep just for one header

15:20 <ms[m]> directories are cheap? I'd at the very least want the module name there, if we end up starting to include test headers from different modules

15:20 <gonidelis[m]> ok

15:20 <gonidelis[m]> i will go by it then l)

15:20 <gonidelis[m]> ;)

15:28 hkaiser has joined #ste||ar

15:51 <K-ballo> directories aren't all that cheap

15:51 <K-ballo> #include needs to stat a lot of directories to find the target most of the time

15:51 <K-ballo> even VS's open include file feature takes several seconds to react

15:52 <K-ballo> things like putting boost's include path first can have a (small but non negligible) effect on compilation time

16:02 diehlpk_work has joined #ste||ar

16:44 <diehlpk_work> ms[m], jbjnr Any idea what could be wrong with the mpi parcelport not working with > 64 nodes?

16:46 <hkaiser> diehlpk_work: what error do you see?

16:46 <diehlpk_work> From 1 to 32 everything works, however for > 32 nodes hpx hnags on startup

16:47 <diehlpk_work> I see a print before the hpx main or hpx init but after that brints nothing happens for 10 minutes

16:47 <hkaiser> hmmm, Summit?

16:47 <diehlpk_work> Yes

16:48 <hkaiser> but it works on other machines, doesn't it?

16:48 <diehlpk_work> Yes

16:49 <diehlpk_work> and no.

16:49 <diehlpk_work> We never tested on any other machine with so many localtities

16:50 <hkaiser> diehlpk_work: does hello_world work?

16:53 <diehlpk_work> hkaiser, Yes

16:54 <diehlpk_work> --hpx.lcos.collectives.cut_off=640000

16:54 <diehlpk_work> What does this option do?

16:56 <hkaiser> not sure, need to look - but it should work without that

16:57 <diehlpk_work> https://github.com/STEllAR-GROUP/hpx/issues/2516

16:58 <hkaiser> ok, I was not aware of this

16:58 <hkaiser> Thomas might remember

16:59 <hkaiser> diehlpk_work: the default for the cut_off is ~0x0 (unsigned), so setting any value doesn't make it larger

17:01 <diehlpk_work> Ok, I will try to go back to hpx 1.5

17:02 <diehlpk_work> Dominic and Sagiv used that verison on QB

17:03 <hkaiser> diehlpk_work: let me know if this fixes the issue, please

17:05 <diehlpk_work> -DHPX_WITH_MAX_CPU_COUNT=512 \

17:05 <diehlpk_work> hkaiser, That is per node or and should not affect the network or?

17:11 <hkaiser> this is per node

17:11 <hkaiser> not sure why you 512 cores, though

17:11 <hkaiser> shouldn't 256 be sufficient?

17:13 hkaiser has quit [Quit: bye]

17:21 <diehlpk_work> We had it before and Gregor was asking to increase the number

18:02 <gdaiss[m]> No, 256 should be sufficient, we would only exceed that if we use one locality per node and all 4 hyperthreads per core (which we don't)

18:02 <gdaiss[m]> I am merely confused by Patrick's description of the crashes as Octo-Tiger initialization seems to fail when exceeding 256 localities (we use 6 localities per node). No idea why though

18:25 <diehlpk_work> gdaiss[m], It works with 512

18:26 <diehlpk_work> but hangs for the IO

19:29 hkaiser has joined #ste||ar

19:46 <diehlpk_work> hkaiser, Funny using 512 lets the code run further for some attempts

19:47 <hkaiser> diehlpk_work: no idea what's wrong

20:22 gsodaspirant has joined #ste||ar

20:22 <gsodaspirant> Hello, Is the 2021 GSOD position filled?

20:26 gsodaspirant has quit [Client Quit]