#ste||ar on 2020-03-17 — irc logs at irclog.cct.lsu.edu

2020-02-24 20:46 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020

01:01 diehlpk has joined #ste||ar

01:01 diehlpk has quit [Changing host]

01:01 diehlpk has joined #ste||ar

02:00 <diehlpk> hkaiser, Can you add sth for the performance counters?

02:00 <hkaiser> sure, what?

02:01 <diehlpk> Can you add in the description a remark why we need them

02:01 <diehlpk> Sth like measuring performance within HPX is not so easy and we need our one performance counters

02:01 <hkaiser> I can try, would tomorrow be ok?

02:02 <diehlpk> So we avoid any stupid reviwer to ask why not use any existing solution

02:02 <diehlpk> Sure tomorrow is sufficient

02:02 <diehlpk> Just read your text this evening, took some time to setup my computer at homew

02:04 <diehlpk> And can you reference the listing with the application performance counters and elaborate on them as well?

02:04 <hkaiser> k

02:11 diehlpk has quit [Ping timeout: 246 seconds]

02:13 K-ballo has quit [Quit: K-ballo]

02:27 hkaiser has quit [Quit: bye]

04:42 weilewei has quit [Ping timeout: 240 seconds]

07:09 Pranavug has joined #ste||ar

07:12 <Pranavug> Hello All, as a part of my GSoC proposal I am planning to measure timing improvements on distributed system. However, I dont have access to any cluster. Is it possible to get reasonaby accurate results without having a cluster set up? Thanks

07:24 <zao> Pranavug: You should probably wait for an answer from someone formally involved with HPX, but I believe they still have a small development cluster.

07:29 <Pranavug> Zao : Thank you. Sure.

07:30 <zao> I'm curious, which project in the list is this?

07:32 <Pranavug> I'm planning to work on "Domain decomposition and load balancing for crack and fracture mechanics"

07:34 <Pranavug> It may not seem to be necessary, however I thought to showcase the timing results to illustrate the improvement as well

07:39 <zao> Hard to demonstrate an improvement without measuring the improvement :D

07:40 <zao> You've got the mentor here on IRC as diehlpk, btw.

07:41 <Pranavug> Thanks, Yes I am aware. I was hoping a few contributors would have faced such situation recently though

08:03 <jbjnr> Pranavug: in the past we have been able to give some gsoc students access to larger machines/clusters

08:04 <Pranavug> jbjnr : OK Thanks for letting me know.

08:11 K-ballo has joined #ste||ar

08:14 K-ballo has quit [Remote host closed the connection]

08:14 K-ballo has joined #ste||ar

08:21 <simbergm> Pranavug: ping

08:22 <Pranavug> simbergm : You mentioned about simple HPX code which I have to submit

08:22 <Pranavug> To whom and till when do I submit it?

08:23 <simbergm> about the small program, I suggest you put it on github and put a link in your application

08:23 <simbergm> we've typically had students submit their applications as google documents

08:23 <Pranavug> Sure, will do so

08:23 <Pranavug> Thanks

08:24 <simbergm> feel free to ask questions about building and understanding HPX, but we just want to make sure that applicants don't start from zero when the actual project begins

08:55 mdiers_ has joined #ste||ar

11:09 iti has joined #ste||ar

11:12 <Hashmi> Hello, everyone. Can someone help me with cmake install error for hpx?

11:13 <Hashmi> https://www.irccloud.com/pastebin/X7b1GwnU

11:13 <heller1> Hashmi: hey!

11:13 <Hashmi> Things: i have tried

11:13 <heller1> this is happening in the install step, I guess>

11:14 <Hashmi> Using sudo for make and make install. Changing cmake and putting cmake in path

11:14 <Hashmi> Hey @heller1

11:14 <Hashmi> Yes it is

11:14 <heller1> it looks like the user that's trying to install, can't write to /hpx

11:14 <Hashmi> Why might that be when I am the admin

11:14 <heller1> sudo for make shouldn't be needed, at all, ever

11:15 <Hashmi> That’s what I originally thought but since nothing was resolving the error I went ahead and sudoed it

11:15 <heller1> what does `sudo mkdir -p /hpx/install/path/bin` say?

11:16 <Hashmi> Read only file system. So the problem is I am not getting admin privileges at all.

11:17 <heller1> no, you have admin privileges, but can't write to '/' since it appears to be a read only file system

11:17 <heller1> can you post the output of `mount`? Which environment is this?

11:17 <heller1> brb

11:18 <Hashmi> https://www.irccloud.com/pastebin/hlUnMwJg

11:18 <Hashmi> Macos

11:18 Pranavug has quit [Quit: Leaving]

11:19 <Hashmi> Np. Take your time.

11:22 <simbergm> Hashmi: btw, is `/hpx/install/path` from the documentation or a tutorial or something like that? if yes, I might change it

11:22 <simbergm> it's most likely just a placeholder for a real path that you should choose yourself

11:23 <simbergm> and you can for example try to install to `$HOME` as well (or some suitable subdirectory)

11:24 <simbergm> you might also want to clean your build directory and not specify `CMAKE_INSTALL_PREFIX`

11:24 <simbergm> cmake may pick a more suitable location for you by default (one that you actually can write to)

11:25 <Hashmi> Oh riiiiiight. That makes sense. Let me try installing it somewhere else

11:26 <Hashmi> It is from https://github.com/STEllAR-GROUP/phylanx/wiki/Build-Instructions

11:29 <zao> macOS has some system protection in place that restricts regular superuser from hecking the machine up.

11:29 <zao> Install into somewhere you have control, like your home directory or /usr/local.

11:30 <zao> Hashmi: It's kind of meant as a `/path/where/you/want/hpx` kind of thing, not a literal path :D

11:31 <zao> I like $HOME/opt/hpx-master or something similarly contained, so it's easy to blow away when you need to :D

11:31 <Hashmi> Got it! Thank you so much!!

12:12 diehlpk has joined #ste||ar

12:12 diehlpk has quit [Changing host]

12:12 diehlpk has joined #ste||ar

12:26 diehlpk has quit [Ping timeout: 246 seconds]

12:31 hkaiser has joined #ste||ar

13:40 diehlpk has joined #ste||ar

13:40 diehlpk has quit [Changing host]

13:40 diehlpk has joined #ste||ar

13:54 diehlpk has quit [Ping timeout: 246 seconds]

13:56 K-ballo has quit [Remote host closed the connection]

13:56 K-ballo has joined #ste||ar

13:58 diehlpk_work has joined #ste||ar

14:11 nikunj97 has joined #ste||ar

14:14 iti has quit [Ping timeout: 246 seconds]

14:14 weilewei has joined #ste||ar

14:20 iti has joined #ste||ar

14:20 <diehlpk_work> weilewei, yet?

14:20 <weilewei> diehlpk_work yes

14:21 <diehlpk_work> https://github.com/SC-Tech-Program/Author-Kit

14:21 <diehlpk_work> Can you run the script there on summit and send me the output?

14:22 <diehlpk_work> We like to have a most similar configuration of our summit nodes here at CCT

14:22 <weilewei> Ok, let me try how to run it

14:23 <diehlpk_work> Just checkout the repo and submit a job on one of the compute nodes

14:23 <diehlpk_work> and send me the pbs output

14:23 <diehlpk_work> Sai will have a look and will try to install similar stuff

14:27 Pranavug has joined #ste||ar

14:28 Pranavug has quit [Client Quit]

14:31 <hkaiser> diehlpk_work: meeting at 11 today? (I'm still confused as the email says 11CST, which would be 10am)...

14:32 <diehlpk_work> hkaiser, Are we not in CST?

14:32 <hkaiser> we're currently in CDT (Central Daylight-saving Time)

14:33 <hkaiser> CST == Central Standard Time

14:33 <diehlpk_work> Ok, I missed that

14:33 <diehlpk_work> I will send an email

14:33 <diehlpk_work> have you seen my pm?

14:33 <hkaiser> so it's 11am today?

14:33 <diehlpk_work> Apex results look strange

14:33 <diehlpk_work> Yes

14:45 <weilewei> diehlpk_work sent the result file to your telegram

14:46 <diehlpk_work> Thanks, I will forward it to Sai

14:46 <diehlpk_work> I will start to work on Summit after the SC paper

14:46 <weilewei> Great!

15:22 Pranavug has joined #ste||ar

15:22 Pranavug has quit [Client Quit]

15:32 iti has quit [Ping timeout: 256 seconds]

15:54 diehlpk has joined #ste||ar

15:54 diehlpk has quit [Changing host]

15:59 diehlpk_work has quit [Ping timeout: 256 seconds]

16:01 diehlpk_work has joined #ste||ar

16:24 iti has joined #ste||ar

16:56 diehlpk has quit [Ping timeout: 246 seconds]

17:00 nan111 has joined #ste||ar

17:10 iti has quit [Ping timeout: 246 seconds]

17:11 rtohid has joined #ste||ar

17:15 <simbergm> ping to the irc side

17:17 <weilewei> ping (as well

17:19 gonidelis has joined #ste||ar

17:20 KatieB has joined #ste||ar

17:27 akheir has joined #ste||ar

17:36 gonidelis has quit [Remote host closed the connection]

17:38 KatieB has quit [Ping timeout: 240 seconds]

17:44 gdaiss[m] has joined #ste||ar

17:44 <gdaiss[m]> ms: , freifrau_von_bleifrei So I guess we are moving to the HPX channel now for all Kokkos-related stuff! Speaking of which: Do you two want to do another call soon to sync up our current progress with the executors and allocators? It's been a while since we've had one

17:52 freifrau_von_ble has joined #ste||ar

17:52 <freifrau_von_ble> <gdaiss[m] "ms: , freifrau_von_bleifrei S"> yup, sounds reasonable!

17:56 <simbergm> gdaiss: freifrau_von_bleifrei : yep, let's do that this week

17:56 <simbergm> thursday is meeting day for me anyway so maybe then? at 3? other days and times work too...

17:57 <gdaiss[m]> Thursday at 3 works for me!

17:58 <freifrau_von_ble> 👍️

17:58 <Hashmi> @diehlpk_work: is it okay to pm you regarding gsoc proposal?

17:59 <hkaiser> gdaiss[m]: please loop me into this call, if possible - I'm very interested in getting involved

17:59 <diehlpk_work> Hashmi, we can discuss here in public

17:59 akheir has quit [Ping timeout: 256 seconds]

17:59 <Hashmi> Oh yes if its okay..

18:00 <diehlpk_work> So all can give their feedback

18:00 <Hashmi> Its regarding pip package for phylanx

18:00 <diehlpk_work> Sure

18:00 <Hashmi> So I have been researching about pypi packages

18:01 <hkaiser> simbergm: do my message make it through to matrix?

18:01 <Hashmi> I think current cmake build is possible to trigger through setup.py

18:01 <Hashmi> That is if phylanx starts passing the build yests

18:01 <Hashmi> Tests*

18:02 <Hashmi> I checkedout older stable commits but it doesnt build

18:03 <diehlpk_work> Hashmi, The challenge for this project to ship the dependencies

18:03 <gdaiss[m]> <hkaiser "ms: do my message make it throug"> hkaiser: Yes, at least I can them read them now. About the call: sure, we just need to decide on a conference call software! ms what did we use last time? was it Matrix for the call as well?

18:04 <diehlpk_work> Hashmi, What does not build?

18:04 <hkaiser> gdaiss[m]: zoom works well enough these days

18:04 <diehlpk_work> Can you post the error on pastebin or as a gist on github

18:05 <Hashmi> Yes.. if we package hpx, blaze and blaze tensor before we publish package for phylanx then it should work

18:05 <diehlpk_work> For phylanx building issues hkaiser or rtohid are good starting poitns

18:05 <Hashmi> Oh sure a min

18:05 <hkaiser> gdaiss[m]: both simbergm as well as I can schedule zoom meetings easily any time

18:05 parsaamini[m] has joined #ste||ar

18:05 karame78 has joined #ste||ar

18:06 <diehlpk_work> Hashmi, I do nit think we can package all packages for ubuntu and fedora

18:06 <gdaiss[m]> then let's use Zoom! I was about to suggest the DFN conference call software that we usually use in Stuttgart, but naturally that one stopped functioning on Monday when everybody started working from home

18:07 <diehlpk_work> Getting hpx into Fedora took several months

18:07 <Hashmi> @diehlpk_work: oh may I ask why?

18:07 <hkaiser> gdaiss[m]: ok, let me know if you need me to schedule things

18:07 <hkaiser> I think it even works ad hoc, if needed

18:08 <diehlpk_work> Getting a package to a major distribution is not easy. You have to write the spec file for Fedora, get it reviewed, and approved

18:08 parsaamini[m] is now known as parsa[m]

18:08 <diehlpk_work> Hashmi, https://fedoraproject.org/wiki/Join_the_package_collection_maintainers

18:09 <diehlpk_work> Another thing is that the packages are not good with respect to performance, because they are not build on the hardware and optimization is missing

18:10 bita has joined #ste||ar

18:10 <diehlpk_work> Sicne we use phylanx on HPC systems it would be slow comapred to a native build

18:10 <diehlpk_work> Hashmi, I think a good starting point would be to build hpx and phylanx

18:10 <diehlpk_work> Have you done this?

18:11 <Hashmi> I have been able to build hpx and other dependencies

18:11 <Hashmi> Phylanx doesnt build

18:11 <Hashmi> Sorry error you asked for coming through

18:11 <diehlpk_work> Ok, next step would be to get phylanx to build

18:11 <Hashmi> https://www.irccloud.com/pastebin/IEBUFSME

18:11 <simbergm> hkaiser: yep, I see the messages as well

18:12 <hkaiser> good, thanks

18:12 <Hashmi> This is using an older build

18:12 <simbergm> we can also combine the kokkos/hpx call with the usual hpx call

18:12 <Hashmi> With verified commits and build pass

18:12 akheir has joined #ste||ar

18:12 <simbergm> gdaiss: freifrau_von_bleifrei that would be at 4 (which is why I suggested 3 at first)

18:12 <jbjnr> gregordaiss[m]: ms[m] may I also join the kokkos call please?

18:12 <hkaiser> simbergm: not sure if we should do that - there is plenty of HPX specific (organizational) things we need to talk about

18:12 <diehlpk_work> Hashmi, I did not build phylanx since along time and someone else might can help you

18:13 <simbergm> or we start at 3:30 and just continue with the hpx call afterwards

18:13 <jbjnr> somethint weird happened to my names in the message above

18:13 <simbergm> yeah, that's true

18:13 <simbergm> jbjnr: yt? have you had time to look at the governance PR? should I go clean that up?

18:13 <hkaiser> simbergm : that means I have to be up at 8:30am ;-) I'll see what I can do

18:14 <Hashmi> @diehlpk_work: i am going to try docker build next just to see if it works. But for project I would need to use cmake.

18:14 <diehlpk_work> Hashmi, First work on getting phylanx to compile

18:14 <jbjnr> ms[m]: I'll look at the governance stuff tomorrow. Might need to be reminded. I keep forgetting

18:15 <hkaiser> jbjnr: we have our PMC call this week Thursday

18:15 <hkaiser> would be nice to have it done by then

18:15 <jbjnr> ok. then tomorrow I'd better look at it.

18:15 <diehlpk_work> Another step would be do investigate how to ship c++ dependencies in a pip package

18:15 <hkaiser> jaafar: much apprciated

18:15 <gdaiss[m]> @ms Well, 3.30 would work for me as well! Most other meetings and appointments around here are canceled anyway!

18:16 <simbergm> jbjnr: no problem, I'll remind you again tomorrow

18:16 <Hashmi> @diehlpk_work: what other c++ dependies? I know about pybind11

18:16 <diehlpk_work> blaze, hwloc, boost, jmalloc, and so on?

18:16 <hkaiser> jaafar: sorry - I did it again

18:16 <Hashmi> I will keep working on compiling phylanx

18:17 <jbjnr> when is the gsoc deadline?

18:17 akheir has quit [Read error: Connection reset by peer]

18:17 <jbjnr> (for applications I mean)

18:17 <Hashmi> diehlpk_work: i see! I

18:18 akheir has joined #ste||ar

18:18 <jbjnr> gregordaiss[m]: ms[m] trying agin to ask - may I also join the kokkos meeting please.

18:18 <jbjnr> it did it again

18:18 <Hashmi> @diehlpk_work: blaze will need to be shipped as a package itself. Boost has pip package. And i havent found a way to get hwloc, tcmalloc etc

18:19 <simbergm> hkaiser, freifrau_von_bleifrei , gdaiss all right, let's do 3 pm CET in case we need more time? should be 9 am in baton rouge I think? does that work for everyone?

18:19 <simbergm> also jbjnr ^

18:19 <hkaiser> simbergm: ahh yes, this week it's still 9am, fine for me

18:19 <diehlpk_work> hkaiser, Can you look into the plots Kevin sent around?

18:19 <simbergm> jbjnr: student application deadline is 31st I think

18:19 <hkaiser> and we roll over into the hpx stuff afterwards

18:19 <diehlpk_work> We need to decide which one of these should be added to the paper

18:19 <simbergm> yep

18:20 <hkaiser> diehlpk_work: doing it as we speak

18:20 <diehlpk_work> I like the subgrids per Joule

18:20 <simbergm> I still have to check if/how I can host a zoom meeting not being in the office, but I'll let you all know

18:20 <hkaiser> yah, even that's not too favourable ;-)

18:20 <diehlpk_work> simbergm, hkaiser can do that

18:21 <diehlpk_work> I was hosting a zoom meeting from home this morning

18:21 parsa|| has joined #ste||ar

18:21 parsa|| has quit [Client Quit]

18:21 <simbergm> diehlpk_work: all right, good

18:22 <diehlpk_work> hkaiser, We have to remind that we have not done scaling runs due to convergence test

18:22 <diehlpk_work> All the problems are too small

18:22 <simbergm> I want to check for my own purposes anyway

18:22 ahkeir1 has joined #ste||ar

18:22 akheir has quit [Read error: Connection reset by peer]

18:23 <diehlpk_work> For 64 nodes we have 20 subgrids per node

18:23 <jbjnr> are you huys talking about thursday? or Wednesday?

18:24 <diehlpk_work> Level 11 is even worse with 5 subgrids per node on 1024 nodes

18:24 <simbergm> jbjnr: thursday

18:24 <hkaiser> uhh, that's less subgrids than cores!

18:24 <simbergm> hpx call at 4, kokkos/hpx call at 3

18:24 <jbjnr> Thursday is a bank holiday here and although it makes no difference now that we're all home al the time ... I was planning on cycling through the woods

18:25 <diehlpk_work> Yes, this is the trade-off for the convergence test

18:25 <jbjnr> ok. So I'd best be back before 3pm

18:25 <diehlpk_work> We could not afford to run higher levels for such a long time

18:25 <jbjnr> gosh - this matrix is so slow. I send a message and it takes 30 seconds to apear

18:25 <simbergm> oooh right...

18:25 <diehlpk_work> jbjnr, same here. matrix irc bridge is slowww

18:25 <simbergm> were you planning on cycling all day?

18:26 <diehlpk_work> Level 12 has 100 subgrids on 1900 nodes

18:26 <gdaiss[m]> ms: I am not dead set on Thursday, especially if it's a holiday for you guys! Friday is fine by me too

18:26 <jbjnr> no, but normally I would do it in the afternoon rather than the morning. Let the sun warm the place up a bit before going out

18:26 <hkaiser> diehlpk_work: that does not make any sense

18:26 <hkaiser> 100 subgrids overall on 1900 nodes?

18:26 <diehlpk_work> No per node

18:27 <hkaiser> that's less subgrids than nodes!

18:27 <hkaiser> ahh

18:27 <hkaiser> how many cores do we have?

18:27 <hkaiser> per node

18:27 <diehlpk_work> 32

18:27 <jbjnr> thursday will be ok, worst case scenario, I don't join, but I'll try to be here

18:27 <hkaiser> 3 subgrids per core

18:27 <hkaiser> jbjnr: we would miss you!

18:27 <diehlpk_work> hkaiser, We have the same issue as last year, IO is so bad that we can not do really higher levels

18:28 <hkaiser> k

18:28 <diehlpk_work> Loading level 12 on 1024 nodes takes 2 hours

18:28 <jbjnr> you need the libfabric PP for regridding and IO :(

18:28 <hkaiser> diehlpk_work: well dominic said that he's working on the regridding tool

18:28 <diehlpk_work> Yes, this will help

18:29 <diehlpk_work> Without that tool I can not do any runs on summit

18:29 shahrzad has joined #ste||ar

18:29 <diehlpk_work> hkaiser, We should not talk about scaling at all in this paper

18:30 <jbjnr> The startup time for LF was reduced from hours to seconds when I ran last year. Such a shame it doesn't work any more. I wish I new what was broken

18:30 <diehlpk_work> That was the best we can do with the IO and node hours we had

18:30 <diehlpk_work> jbjnr, Ok, cool

18:30 <diehlpk_work> Yes libfabric would be nice

18:31 <diehlpk_work> jbjnr, I will try to get octiger running on summit next month

18:31 <diehlpk_work> we have 20k node hours for some sclaing runs

18:31 <diehlpk_work> Having libfabric there would be awesome

18:34 <diehlpk_work> hkaiser, I do not have any node second left on Cori and can not try to collect the new data

18:34 <diehlpk_work> Waiting for Alice to add me to the new project

18:35 <diehlpk_work> jbjnr, Would you have time to work on libfabric on Summit?

18:37 <jbjnr> maybe. Not really sure what I'll be doing. I'm on another project altogether now - putting libfabric into a different non-hpx code. Would like to work on the senders/receivers stuff. we need that for the iso c++ feedback. supposed to be helping with DCA++ but haven't managed to get stuck into that either.

18:38 <hkaiser> diehlpk_work: then leave out the 16 node run

18:40 <diehlpk_work> hkaiser, let us wait for more node hours

18:41 <hkaiser> k

18:42 <diehlpk_work> jbjnr, Ok, I will start with the MPI runs first

18:43 <jbjnr> why are you running octotiger on summit anyway - the SC deadline will be long past by next month won't it?

18:43 <diehlpk_work> jbjnr, For the Invite proposal

18:44 <diehlpk_work> Last year the complained that we never shown sclaing runs on Summit

18:44 <diehlpk_work> So this year we will add them to the incite proposal and they will find something new to reject us

18:45 <jbjnr> lack of any real scientific merit?

18:45 <diehlpk_work> We are working on that

18:45 <diehlpk_work> finger crossed

18:45 <diehlpk_work> Does anyone know how to connect to the LSU library to access paper behind a pay wall?

18:46 <heller1> vpn?

18:48 <hkaiser> diehlpk_work: yah, use vpn, then you'll have access

19:03 <parsa[m]> diehlpk_work: use ezproxy

19:04 <diehlpk_work> parsa[m], How do I do this

19:04 <parsa[m]> http://libezp.lib.lsu.edu/login

19:04 <parsa[m]> it will have you sign in with your LSU ID

19:05 <diehlpk_work> Ok, lol, the new solution from LSU does not work on Linux

19:05 <zao> :D

19:06 <diehlpk_work> There is no download of a linux client after login

19:06 <parsa[m]> diehlpk_work: you can also use this bookmarklet i made to automatically convert regular urls to lsu ezproxy urls: `javascript:window.location='http://libezp.lib.lsu.edu/login?url=';+document.location.href`

19:07 <zao> Heh, we use the same kind of system - `javascript:void(location.href='http://proxy.ub.umu.se/login?url=';+location.href)`

19:07 <parsa[m]> click on the bookmarklet from the journal page you want to open, it will re-open the page in lsu's ezproxy

19:08 <parsa[m]> zao: sure. it's good enough. isn't it?

19:08 <zao> Sometimes ours considers sites out of scope and refuses to even try, but otherwise fine.

19:10 <parsa[m]> for those just search the article's title in an ezproxied google scholar page. probably would open fine

19:21 <weilewei> hkaiser ah, I think I have found out the gpudirect error. So the purpose of my work is to send G2 matrix generated from each rank around mpi world. however, the DCA++ code use multiple threads, and each thread has one or more G2s at the same time. My mpi_isend stuff only handles one G2 per rank

19:21 <weilewei> so I need something like -> multi-threaded mpi_isend stuff

19:23 shahrzad has quit [Ping timeout: 246 seconds]

19:24 <jbjnr> use a tag to identify different G2's?

19:24 KatieB has joined #ste||ar

19:25 <weilewei> jbjnr ah, catch you! That's possible solution

19:28 <weilewei> jbjnr then I need to somehow insert/bind thread id info into the tag for different G2's

19:28 akheir1_ has joined #ste||ar

19:30 <jbjnr> you must have something you can use to identify the same G2 on different nodes I guess. If not you'll have to give them some kind of label based on the operation thatt's common across nodes.

19:30 ahkeir1 has quit [Ping timeout: 246 seconds]

19:31 <weilewei> jbjnr indeed

19:31 nikunj97 has quit [Remote host closed the connection]

19:31 nk__ has joined #ste||ar

19:33 nikunj97 has joined #ste||ar

19:34 KatieB has quit [Ping timeout: 240 seconds]

19:36 nk__ has quit [Ping timeout: 256 seconds]

19:40 ahkeir1 has joined #ste||ar

19:43 akheir1_ has quit [Ping timeout: 246 seconds]

19:48 ahkeir1 has quit [Quit: Leaving]

20:16 weilewei has quit [Remote host closed the connection]

20:25 nk__ has joined #ste||ar

20:27 nikunj97 has quit [Ping timeout: 264 seconds]

20:30 <jaafar> diehlpk_work: as someone who recently dealt with C++ deps in a Python package, strongly suggest you look at conda instead of pip

20:31 <jaafar> If that's an option

20:33 nk__ has quit [Quit: Leaving]

20:34 <zao> ugh :D

20:34 <hkaiser> jaafar: doesn't this pull in a whole set of infrastructure libraries?

20:38 <zao> Not to mention a "compiler".

20:44 <jaafar> Well, the problem is dependencies

20:44 <jaafar> pip etc. doesn't really contemplate C++ dependencies at all

20:44 <jaafar> So if you use e.g. numpy and something else (graph_tool, in my case) and they use different versions of Boost, say

20:44 <jaafar> kaboom

20:45 <jaafar> conda, whatever its flaws, actually has some concept of the problem

20:45 <jaafar> maybe you have better control over your environment and it's not an issue

20:45 <zao> Which is why any serious HPC deployment uses modules, which has multiple side-by-side versions of things you need.

20:46 <zao> Instead of "whatever conda happens to have today, better hope they don't change something or have too old stuff".

20:46 <zao> s/serious/large-scale/

20:46 <jaafar> you can specify package versions

20:46 <jaafar> pip AFAIK is like "what is C++"

20:46 <zao> We've got a lot of untangling to do whenever an user comes with a conda-based workflow that "works on their laptop".

20:46 <jaafar> what are dynamic libs

20:46 <jaafar> etc.

20:47 <jaafar> Maybe it's not an issue for you all

20:50 <jaafar> Imagine ODR violations, for example

20:52 <zao> pip (on pypi) has problems in that it doesn't understand dynamic libraries outside of manylinux{1,2010} wheels.

20:52 <jaafar> yep

20:52 <zao> conda attempts to solve it in a distro-like way, but you're going to have a mess of environments trying to get it right, and they're quite hard to lift between systems.

20:53 <jaafar> It is difficult.

20:53 <zao> And the binaries it ships are often so out of phase with what the OS has that they tend to break system stuff.

20:53 <jaafar> Ah!

20:53 <zao> And most importantly, it interfaces _very_ poorly with site-specific MPI deployments.

20:53 <jaafar> So, I'm not talking about just grabbing stuff from online

20:53 <jaafar> I'm suggesting using conda-build

20:54 <jaafar> so you control the binary versions alongside the Python code

20:54 <zao> If someone comes to us with something using conda, multi-node codes are pretty much completely off the table as the MPI will be unusable.

20:54 <jaafar> Why would that be?

20:54 <zao> So conda-build lets you do what, make something that's inbetween a miniconda and a full honking conda env?

20:55 <zao> It's going to have a whole bunch of fundamental libraries that are diverging in versions from what the OS has, to the extent that regular tools you run don't work.

20:55 <zao> So... MPI.

20:55 <zao> On our site, in order for OpenMPI to work in batch jobs it needs to be built against PMIx, PMI2, UCX, and SLURM.

20:56 <zao> Otherwise it's quite likely that srun will have issues, or not work much at all.

20:56 <jaafar> Can you specify architecture and package version for those deps?

20:56 <zao> The OS's OpenMPI is pretty much unusable, already, let alone something shipping with conda.

20:56 <jaafar> For conda-build you can lock them

20:57 <zao> It's not just versions, you need the same shared build of them.

20:57 <jaafar> I'm confused about "shipping with conda" I guess

20:57 <zao> Whatever ends up in your conda world when building a conda env, however you do it, will not be the versions we have installed site-wide which needs to be used.

20:57 <zao> CUDA needs to be installed with awareness of your MPI implementation.

20:58 <jaafar> But can't you just specify your system's implementation and version?

20:58 <zao> It's not just "OpenMPI 4.0.1". It's "OpenMPI 4.0.1 built with UCX 1.8.0, PMIx x.y.z and SLURM libraries in location <x>".

20:59 <zao> It's a very intricate web of runtimes that need to be linked up for the batch to work well.

20:59 <jaafar> When you say "location <x>" do you mean "path on my system"?

20:59 <zao> Yes.

20:59 <jaafar> And there's no way to reproduce that independently?

21:00 <zao> Shared parallel file system with those pre-built, and our modules for OpenMPI etc. reference those when built.

21:00 <zao> If you have so granular control over however you build an OpenMPI to stick into a conda environment, sure, it'd work.

21:00 <jaafar> Yeah I think that's sort of what I had in mind

21:00 <zao> My impression of conda is that you pull qute stock binaries from some upstream forge.

21:00 <jaafar> Maybe it's not workable

21:00 <jaafar> ah yes so I'm suggesting you make a forge, basically :)

21:01 <jaafar> The conda advantage is mainly that it understands binary package versions too

21:01 <jaafar> so you can specify both dependencies

21:01 <jaafar> In unusual cases (like yours) you might have to make your own forge with those deps

21:02 <jaafar> but at least it's represented

21:02 <zao> So say one sets up a forge for a particular site's idea of libraries, so you get all the deps duplicated and right, you still can't share that with anyone.

21:02 <jaafar> and binary version conflicts are noted

21:02 <zao> (anyone on any other system)

21:02 <jaafar> you can make a public forge

21:02 <zao> Share as in that it won't be usable.

21:03 <zao> The setup with UCX and PMIx and everything? That's how my site does it. Another set of filthy hacks is likely in place on the next site over.

21:03 <jaafar> I'm sure I don't really understand your constraints :)

21:03 <zao> For conda to be meaningfully used outside of single-node jobs that have no dependencies on anything external, it needs to mesh in with whatever modules are in place on the site already.

21:04 <jaafar> OK, so just recompile on install? I guess that works

21:04 <jaafar> binary deps are just what's on the system

21:04 <zao> I have yet to see any software packaged with conda to declare anything about how it's built.

21:05 <jaafar> really

21:05 <zao> It's just "point at this forge and install", where the non-conda instructions are rotted and there's rarely any of whatever recipes you use to build a conda forge.

21:05 <zao> At least from the bio junk I've seen.

21:05 <zao> requirements.txt is pretty much always incomplete or out of date, somehow.

21:06 * jaafar looks at dependencies

21:07 karame78 has quit [Remote host closed the connection]

21:08 <zao> Now, if you've got a ste||ar hat on, sure, there might be some value in providing phylanx or HPX in a forge, but it's most probably not overly usable at scale.

21:08 <jaafar> https://anaconda.org/anaconda/numpy/files

21:09 <jaafar> So these all have the OS, architecture, and Python version embedded in their names

21:09 <jaafar> If you unpack one of them the index.json file lists dependencies

21:10 <jaafar> some of which are like "vs2015_runtime" :)

21:10 <jaafar> AFAICT requirements.txt is not a thing Conda uses but I suspect they generate one, or can?

21:11 <jaafar> But I've probably belabored this

21:11 <jaafar> And your situation is not mine

21:13 <jaafar> I'll leave this https://www.anaconda.com/understanding-conda-and-pip/

21:13 <jaafar> It was a nice writeup and persuaded me to invest in conda

21:14 <zao> No-one in the conda world had any interest in understanding the problems outside of the "scientist can install this on their laptop" scenario that comes with large-scale computing.

21:14 <zao> So excuse me if I'm quite sour on this, but I'm quite sour on this.

21:15 <zao> And believe me, we've tried.

21:15 <jaafar> I believe you! Thanks for listening

21:15 <zao> There's software that we've plain not been able to install site-wide as it would be too costly to reverse-engineer how it'd be deployed.

21:16 <zao> Our current setup has modules for libraries, Python, and common tricky packages that require source builds, and users may use virtual environments to install the remainder with pip.

21:16 <zao> A bit janky, but the least horrible we can manage.

21:16 <jaafar> packaging and installation is the worst, I sympathize

21:17 <zao> Compute Canada does something similar but has a wheelhouse with binary wheels for multiple microarchitectures.

21:17 <zao> Kind of cool, but is a bit tied to their underlying OS.

21:33 shahrzad has joined #ste||ar

21:34 gonidelis has joined #ste||ar

21:46 <diehlpk_work> jaafar, sounds interesting. However, our GSoC applicants would have to work on that

21:46 <diehlpk_work> nan111, Did the GSoC student ever come back to you?

21:46 <diehlpk_work> bita, What about you?

21:46 avah has joined #ste||ar

21:58 avah has quit [Ping timeout: 240 seconds]

22:12 shahrzad has quit [Ping timeout: 246 seconds]

22:25 nan111 has quit [Ping timeout: 240 seconds]

22:46 <gonidelis> As I have left aside the API familiarization (I am not going to be an end-user after all...I want to develop the stuff! :D) I have started reading some of your libraries and particularly your range based implemented algorithms as I am going to imitate the way they are written for the 'Range based Parallel Algorithms' project. A question that pops

22:46 <gonidelis> up as I am reading your libs, is: "In the comment section what is the use of '\a' [backslash + 'a']?"

22:50 <heller1> Against which API do you want to develop?

22:52 <gonidelis> What do you mean?

22:54 <heller1> What's the project you want to do?

22:55 <heller1> As in: how familiar are you with the APIs you're supposed to with with over the summer?

22:57 <gonidelis> I would like to contribute on implementing missing parallel algorithms from here (https://github.com/STEllAR-GROUP/hpx/issues/1141) but for the moment I am trying

22:57 <gonidelis> to contribute on the range based ones (https://github.com/STEllAR-GROUP/hpx/issues/1668) in order to form a stronger proposal

22:59 <gonidelis> As for the familiarization the last couple of weeks I was studying the basics of HPX but I need to dig in a little more specifically as proposal deadline is on two weeks

23:03 <zao> Which piece of documentation or code are you talking about up there when you say "comment section"?

23:04 <gonidelis> Well I happened to read the header all_any_none.hpp

23:05 <zao> The triple /// mean it's a documentation comment.

23:05 <zao> \a is some sort of markup in that documentation language.

23:07 <gonidelis> Are these comments exposed somewhere else (maybe in some markup file) besides the source code?

23:08 <zao> The documentation is generated from standalone files together with the contents of the doc-comments.

23:08 maxwellr96 has joined #ste||ar

23:08 <zao> https://stellar-group.github.io/hpx/docs/sphinx/latest/html/libs/algorithms/api.html?highlight=any_of#include-hpx-parallel-container-algorithms-all-any-none-hpp

23:09 <zao> This is built with Sphinx.

23:10 <zao> (also doxygen, it seems)

23:11 <zao> gonidelis: http://www.doxygen.nl/manual/commands.html#cmda

23:14 <gonidelis> Thank you so much! I didn't know that documentation could be written through the main code. I understand what is the point know... (wow! I am learning so much...!)

23:23 <gonidelis> Where could I find how do the tests work? (in the case of all_off.cpp for example) What is their purpose and when and how are they executed? Thank you,

23:27 gonidelis has quit [Ping timeout: 240 seconds]

23:28 diehlpk_work has quit [Remote host closed the connection]

23:53 weilewei has joined #ste||ar