#ste||ar on 2020-08-11 — irc logs at irclog.cct.lsu.edu

2020-02-24 20:46 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020

00:42 jaafar has joined #ste||ar

00:59 hkaiser has joined #ste||ar

01:01 akheir has joined #ste||ar

01:06 <gnikunj[m]> hkaiser: https://github.com/STEllAR-GROUP/hpx/issues/4891

01:24 <hkaiser> gnikunj[m]: nod, thanks

01:24 <hkaiser> does it work if the tcp pp is enabled?

01:24 <gnikunj[m]> yes

01:24 <gnikunj[m]> how do I know if it is using tcp or mpi in that case? iirc tcp is the default case

01:31 <hkaiser> if you use mpirun it will use the mpi pp

02:15 bita has joined #ste||ar

02:56 hkaiser has quit [Quit: bye]

03:02 shahrzad_ has joined #ste||ar

03:04 shahrzad has quit [Ping timeout: 256 seconds]

03:28 shahrzad_ has quit [Quit: Leaving]

03:59 bita has quit [Ping timeout: 260 seconds]

04:07 bita has joined #ste||ar

04:19 akheir has quit [Quit: Leaving]

04:46 Yorlik has quit [Ping timeout: 240 seconds]

04:57 nanmiao11 has quit [Remote host closed the connection]

05:10 bita has quit [Ping timeout: 260 seconds]

07:08 Pranavug has joined #ste||ar

07:08 <Pranavug> gnikunj: Are you currently using rostam cluster? What time will you be done?

07:10 <gnikunj[m]> Pranavug yes I have a script running on 15 nodes right now. It should be done in a couple of hours at best.

07:10 <gnikunj[m]> I'll notify you when I'm done

07:10 <Pranavug> gnikunj: Ok thanks. Please

07:21 <ms[m]> K-ballo: re std::result_of would you have time for a PR for that? we're not in a huge rush with the release

08:02 <gnikunj[m]> Pranavug: you can use the nodes now

08:19 Pranavug has quit [Ping timeout: 256 seconds]

08:59 <gnikunj[m]> Does anyone have experience working with recent PAPI versions? I can't find functions like PAPI_start_counters and PAPI_read_counters

09:05 Pranavug has joined #ste||ar

09:06 <Pranavug> gnikunj: Thanks for informing

10:09 weilewei has quit [Ping timeout: 245 seconds]

10:34 <Pranavug> Hey, why are there so many Jenkins processes on rostam cluster?

11:27 <gonidelis[m]> When I create a ranges-algo overload the last argument is ` Proj&& proj = Proj()`

11:28 <gonidelis[m]> But when I call this ranges-algo in a test, this argument might be missing

11:28 <gonidelis[m]> Does the compiler automatically pass `projection_identity` for this argument or is it a no-match case and thus compiler fails?

11:28 Pranavug has quit [Quit: Leaving]

11:30 <rori> https://en.cppreference.com/w/cpp/language/default_arguments

11:31 <rori> it will use whatever the default constructor `Proj` uses I think

11:31 <gonidelis[m]> So I need to specify projection_identity bymyself as a deafult arg

11:36 <rori> https://github.com/STEllAR-GROUP/hpx/blob/master/libs/algorithms/tests/unit/container_algorithms/any_of_range.cpp#L23-L24

11:37 <gonidelis[m]> so, it has a def-arg

11:37 <gonidelis[m]> and I don't need to pass it

11:37 <rori> 👍️

11:37 <gonidelis[m]> rori_[m]: thanks a lot!

11:38 <rori> it depends how your Proj is created

11:38 <gonidelis[m]> ??

11:39 <rori> I'm not sure where `Proj` is defined but you should have your answer there ;)

11:44 diehlpk_work has quit [Ping timeout: 256 seconds]

12:57 hkaiser has joined #ste||ar

13:01 <hkaiser> hey ms[m]

13:06 <gonidelis[m]> hkaiser: I pushed the ranges::transform CPO ;)

13:06 <hkaiser> gonidelis[m]: \o/

13:07 <gonidelis[m]> hkaiser: I think it is safe to proceed on the BinaryOperation overloads :)

13:07 <gonidelis[m]> hkaiser: reminder: meeting in 1 hour ;)

13:07 <hkaiser> gonidelis[m]: if you feel comfortable to do that - sure!

13:07 <hkaiser> yes, I will be there

13:07 <hkaiser> need coffee first, though

13:08 <gonidelis[m]> hkaiser: Member's must indicate their personal coffee mug in order to be accepted in the meeting anyways

13:09 <gonidelis[m]> Memebers^^

13:09 <hkaiser> ok, deal ;-)

13:17 <ms[m]> hkaiser: hey! sorry didn't see your message yesterday in time

13:17 <hkaiser> np, it was late

13:18 <hkaiser> ms[m]: would you have time for a short(-ish) chat about hpx-kokkos later today or tomorrow?

13:19 <ms[m]> yeah, sure

13:19 <ms[m]> either is fine

13:19 <ms[m]> including right now

13:19 <hkaiser> right now doesn't work, sorry - need coffee

13:19 <ms[m]> np :P

13:19 akheir has joined #ste||ar

13:19 <hkaiser> would 10am/17.00 work? or tomorrow 9am/16.00?

13:20 <ms[m]> let's do it tomorrow morning (for you) then

13:20 <hkaiser> ok

13:20 <hkaiser> thanks a lot

13:20 <ms[m]> 👍️

13:20 <hkaiser> Katie will send a zoom link

13:21 <ms[m]> all right, thanks

13:36 <rori> may I join the hpx-kokkos meeting ? :D

13:39 diehlpk_work has joined #ste||ar

13:43 <hkaiser> rori: sure

14:00 <rori> gonidelis: meeting?

14:00 <gonidelis[m]> I am loggin in right now

14:00 <rori> 👍️

14:07 Yorlik has joined #ste||ar

14:07 <hkaiser> hey Yorlik, welcome back!

14:08 <Yorlik> Heyo!

14:08 <Yorlik> Never been away - just lurking.

14:34 weilewei has joined #ste||ar

14:50 nanmiao11 has joined #ste||ar

15:29 bita has joined #ste||ar

15:39 <diehlpk_work> ms[m], Can we change rostam to a working cluster again?

15:40 <diehlpk_work> Currently, it is a build cluster and I am not sure this is what we want

15:40 <diehlpk_work> One thing we could do is that jenkins can not use all of our GPU nodes

15:40 <hkaiser> diehlpk_work: not sure what you mean by 'working cluster'? isn't it 'working'?

15:41 <diehlpk_work> hkaiser, It is working, but only for jenkins

15:41 <diehlpk_work> not for me because jenkins uses all nodes

15:41 <diehlpk_work> Just wanted to debug octotiger on rostam, but jenkins uses all cuda nodes

15:41 <hkaiser> diehlpk_work: akheir is working on making the jenkins jobs low priority, so that everybody should be able to quickly get access to nodes

15:42 <diehlpk_work> We might keep geev for us solely and jenkisn can use bahram

15:43 <diehlpk_work> and might keep at least one marvin and teo medusa nodes for us and jenkins can not use these

15:44 <diehlpk_work> So we could use these ndoes for debugging and do not have to wait until jenkins finished

15:44 <hkaiser> diehlpk_work: sure, we can do that - we talked about this with akheir last meeting, I believe

15:44 <diehlpk_work> I can mention it again tomorrow

15:44 <hkaiser> pls do

15:45 <diehlpk_work> At least having some nodes available and jenkins can not allocate all would be a first step

15:45 <ms[m]> diehlpk_work: yes, it's not meant to make life horrible for interactive users, this is just the initial configuration

15:45 <ms[m]> let's try what akheir has in mind first, and if that isn't enough we can try to change it further

15:45 <diehlpk_work> Ok, I hope it will become better

15:46 <diehlpk_work> At least I can apply for QB and run my code there

15:46 <diehlpk_work> hkaiser, How do I apply for QB?

15:46 <ms[m]> yeah, indeed, please remind us if things don't improve

15:47 <hkaiser> diehlpk_work: apply for a loni account, put me in as the sponsor

15:47 <diehlpk_work> Ok, I will do that

15:47 <hkaiser> then either use Dominics allocation or apply for one yourself

15:47 <hkaiser> startup allocations are easy to apply for and will be approve immediately, I think

15:48 <diehlpk_work> I think I will apply without DOminic, since I like to have some time to run the peridyanmic code on a large scale

15:48 <diehlpk_work> So we have time for octotiger and my code

15:49 <akheir> ms[m]: There is problem with some Jenkins' runs which I haven't figured out yet. The jobs hang and slurm cannot release the node.

15:50 <diehlpk_work> akheir, Can we exclude geev from the jenkins runs?

15:50 <diehlpk_work> So we have at least one cuda node available?

15:50 <akheir> yes, I will do that today

15:50 <diehlpk_work> Same for one marvin node and two or three medusa nodes?

15:51 <hkaiser> let's create special partitions on rostam to be used by jenkins

15:52 <ms[m]> is it possible to have a separate partitions for jenkins (cpu only and gpu) that are a subset of the other partitions

15:52 <ms[m]> ?

15:52 <hkaiser> nod

15:52 <hkaiser> akheir: ^^

15:52 <diehlpk_work> Can we have a debug queue as well? So we can get a small allocation (15 minutes) with a higher priority?

15:52 <akheir> diehlpk_work: That would be to much fragmentation on partitions. Jenkins jobs won't take that long, lower priority should solve the problem

15:52 <hkaiser> ok, cool

15:53 <ms[m]> 👍️

15:53 <diehlpk_work> akheir, Ok, anything what improves the current situation will be appreciated

15:53 <ms[m]> akheir: not sure what I can do about the hung jobs, the jenkins interface doesn't show anything until jobs are completed

15:53 <akheir> we don't hat that nodes for increasing the number of queues make a impact.

15:54 <ms[m]> if you have any ideas on what I could look at I'm all ears (or if you have access to logs for those jobs that you'd like me to have a look at)

15:55 <akheir> ms[m]: I have to investigate, slurm complains about open I/O files and cannot release the node, the only way to cancel the job is to reboot the node. I think this main reason for complains

15:55 <akheir> I have to find fix for this problem first

15:56 <ms[m]> hmm, ok

15:56 <ms[m]> I'll try to dig around and see if there's anything that looks like it could cause that

15:59 <tiagofg[m]> hkaiser Hello! Regarding the inheritance issue, I would like to know if the problem has a solution or not, I'm finishing my master's thesis and I really needed that information to know if I have to rewrite the code in another way or not

16:00 <tiagofg[m]> For you guys must be a simple thing I guess.

16:01 karame_ has joined #ste||ar

16:01 <hkaiser> tiagofg[m]: you wanted to create a small example I could look at

16:02 <karame_> hkaiser Could you please send me the zoom ID.

16:02 <hkaiser> karame_: https://lsu.zoom.us/j/825129487

16:03 <akheir> ms[m]: I didn't see your comment about the special partition for Jenkins. Yes that's the way I have to configure the nodes, In slurm the queue and partition are the same. in order to have lower priority queue we have to create new partitions

16:06 shahrzad has joined #ste||ar

16:06 <tiagofg[m]> hkaiser: sure

16:06 <tiagofg[m]> https://github.com/tiagofgon/hpx_example/blob/master/heranca.hpp

16:07 <hkaiser> tiagofg[m]: will look

16:07 <ms[m]> akheir: 👍️

16:07 <tiagofg[m]> hkaiser: the main function is on the same repository. Thanks!

16:08 shahrzad has left #ste||ar [#ste||ar]

16:11 akheir has quit [Quit: Leaving]

16:12 akheir has joined #ste||ar

16:12 shahrzad has joined #ste||ar

16:13 shahrzad has quit [Client Quit]

16:13 shahrzad has joined #ste||ar

16:15 <weilewei> hkaiser see DM, please

16:16 <hkaiser> weilewei: sec

16:17 <weilewei> ok

17:03 shahrzad has quit [Remote host closed the connection]

17:03 shahrzad has joined #ste||ar

19:21 <hkaiser> tiagofg[m]: yt?

20:28 <K-ballo> is the inheritance puzzle solved?

20:48 bita has quit [Read error: Connection reset by peer]

21:00 <hkaiser> K-ballo: not yet, I need to find out what the puzzle was, first