hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020
diehlpk has joined #ste||ar
diehlpk has quit [Changing host]
diehlpk has joined #ste||ar
<diehlpk> hkaiser, Can you add sth for the performance counters?
<hkaiser> sure, what?
<diehlpk> Can you add in the description a remark why we need them
<diehlpk> Sth like measuring performance within HPX is not so easy and we need our one performance counters
<hkaiser> I can try, would tomorrow be ok?
<diehlpk> So we avoid any stupid reviwer to ask why not use any existing solution
<diehlpk> Sure tomorrow is sufficient
<diehlpk> Just read your text this evening, took some time to setup my computer at homew
<diehlpk> And can you reference the listing with the application performance counters and elaborate on them as well?
<hkaiser> k
diehlpk has quit [Ping timeout: 246 seconds]
K-ballo has quit [Quit: K-ballo]
hkaiser has quit [Quit: bye]
weilewei has quit [Ping timeout: 240 seconds]
Pranavug has joined #ste||ar
<Pranavug> Hello All, as a part of my GSoC proposal I am planning to measure timing improvements on distributed system. However, I dont have access to any cluster. Is it possible to get reasonaby accurate results without having a cluster set up? Thanks
<zao> Pranavug: You should probably wait for an answer from someone formally involved with HPX, but I believe they still have a small development cluster.
<Pranavug> Zao : Thank you. Sure.
<zao> I'm curious, which project in the list is this?
<Pranavug> I'm planning to work on "Domain decomposition and load balancing for crack and fracture mechanics"
<Pranavug> It may not seem to be necessary, however I thought to showcase the timing results to illustrate the improvement as well
<zao> Hard to demonstrate an improvement without measuring the improvement :D
<zao> You've got the mentor here on IRC as diehlpk, btw.
<Pranavug> Thanks, Yes I am aware. I was hoping a few contributors would have faced such situation recently though
<jbjnr> Pranavug: in the past we have been able to give some gsoc students access to larger machines/clusters
<Pranavug> jbjnr : OK Thanks for letting me know.
K-ballo has joined #ste||ar
K-ballo has quit [Remote host closed the connection]
K-ballo has joined #ste||ar
<simbergm> Pranavug: ping
<Pranavug> simbergm : You mentioned about simple HPX code which I have to submit
<Pranavug> To whom and till when do I submit it?
<simbergm> about the small program, I suggest you put it on github and put a link in your application
<simbergm> we've typically had students submit their applications as google documents
<Pranavug> Sure, will do so
<Pranavug> Thanks
<simbergm> feel free to ask questions about building and understanding HPX, but we just want to make sure that applicants don't start from zero when the actual project begins
mdiers_ has joined #ste||ar
iti has joined #ste||ar
<Hashmi> Hello, everyone. Can someone help me with cmake install error for hpx?
<heller1> Hashmi: hey!
<Hashmi> Things: i have tried
<heller1> this is happening in the install step, I guess>
<Hashmi> Using sudo for make and make install. Changing cmake and putting cmake in path
<Hashmi> Hey @heller1
<Hashmi> Yes it is
<heller1> it looks like the user that's trying to install, can't write to /hpx
<Hashmi> Why might that be when I am the admin
<heller1> sudo for make shouldn't be needed, at all, ever
<Hashmi> That’s what I originally thought but since nothing was resolving the error I went ahead and sudoed it
<heller1> what does `sudo mkdir -p /hpx/install/path/bin` say?
<Hashmi> Read only file system. So the problem is I am not getting admin privileges at all.
<heller1> no, you have admin privileges, but can't write to '/' since it appears to be a read only file system
<heller1> can you post the output of `mount`? Which environment is this?
<heller1> brb
<Hashmi> Macos
Pranavug has quit [Quit: Leaving]
<Hashmi> Np. Take your time.
<simbergm> Hashmi: btw, is `/hpx/install/path` from the documentation or a tutorial or something like that? if yes, I might change it
<simbergm> it's most likely just a placeholder for a real path that you should choose yourself
<simbergm> and you can for example try to install to `$HOME` as well (or some suitable subdirectory)
<simbergm> you might also want to clean your build directory and not specify `CMAKE_INSTALL_PREFIX`
<simbergm> cmake may pick a more suitable location for you by default (one that you actually can write to)
<Hashmi> Oh riiiiiight. That makes sense. Let me try installing it somewhere else
<zao> macOS has some system protection in place that restricts regular superuser from hecking the machine up.
<zao> Install into somewhere you have control, like your home directory or /usr/local.
<zao> Hashmi: It's kind of meant as a `/path/where/you/want/hpx` kind of thing, not a literal path :D
<zao> I like $HOME/opt/hpx-master or something similarly contained, so it's easy to blow away when you need to :D
<Hashmi> Got it! Thank you so much!!
diehlpk has joined #ste||ar
diehlpk has quit [Changing host]
diehlpk has joined #ste||ar
diehlpk has quit [Ping timeout: 246 seconds]
hkaiser has joined #ste||ar
diehlpk has joined #ste||ar
diehlpk has quit [Changing host]
diehlpk has joined #ste||ar
diehlpk has quit [Ping timeout: 246 seconds]
K-ballo has quit [Remote host closed the connection]
K-ballo has joined #ste||ar
diehlpk_work has joined #ste||ar
nikunj97 has joined #ste||ar
iti has quit [Ping timeout: 246 seconds]
weilewei has joined #ste||ar
iti has joined #ste||ar
<diehlpk_work> weilewei, yet?
<weilewei> diehlpk_work yes
<diehlpk_work> Can you run the script there on summit and send me the output?
<diehlpk_work> We like to have a most similar configuration of our summit nodes here at CCT
<weilewei> Ok, let me try how to run it
<diehlpk_work> Just checkout the repo and submit a job on one of the compute nodes
<diehlpk_work> and send me the pbs output
<diehlpk_work> Sai will have a look and will try to install similar stuff
Pranavug has joined #ste||ar
Pranavug has quit [Client Quit]
<hkaiser> diehlpk_work: meeting at 11 today? (I'm still confused as the email says 11CST, which would be 10am)...
<diehlpk_work> hkaiser, Are we not in CST?
<hkaiser> we're currently in CDT (Central Daylight-saving Time)
<hkaiser> CST == Central Standard Time
<diehlpk_work> Ok, I missed that
<diehlpk_work> I will send an email
<diehlpk_work> have you seen my pm?
<hkaiser> so it's 11am today?
<diehlpk_work> Apex results look strange
<diehlpk_work> Yes
<weilewei> diehlpk_work sent the result file to your telegram
<diehlpk_work> Thanks, I will forward it to Sai
<diehlpk_work> I will start to work on Summit after the SC paper
<weilewei> Great!
Pranavug has joined #ste||ar
Pranavug has quit [Client Quit]
iti has quit [Ping timeout: 256 seconds]
diehlpk has joined #ste||ar
diehlpk has joined #ste||ar
diehlpk has quit [Changing host]
diehlpk_work has quit [Ping timeout: 256 seconds]
diehlpk_work has joined #ste||ar
iti has joined #ste||ar
diehlpk has quit [Ping timeout: 246 seconds]
nan111 has joined #ste||ar
iti has quit [Ping timeout: 246 seconds]
rtohid has joined #ste||ar
<simbergm> ping to the irc side
<weilewei> ping (as well
gonidelis has joined #ste||ar
KatieB has joined #ste||ar
akheir has joined #ste||ar
gonidelis has quit [Remote host closed the connection]
KatieB has quit [Ping timeout: 240 seconds]
gdaiss[m] has joined #ste||ar
<gdaiss[m]> ms: , freifrau_von_bleifrei So I guess we are moving to the HPX channel now for all Kokkos-related stuff! Speaking of which: Do you two want to do another call soon to sync up our current progress with the executors and allocators? It's been a while since we've had one
freifrau_von_ble has joined #ste||ar
<freifrau_von_ble> <gdaiss[m] "ms: , freifrau_von_bleifrei S"> yup, sounds reasonable!
<simbergm> gdaiss: freifrau_von_bleifrei : yep, let's do that this week
<simbergm> thursday is meeting day for me anyway so maybe then? at 3? other days and times work too...
<gdaiss[m]> Thursday at 3 works for me!
<freifrau_von_ble> 👍️
<Hashmi> @diehlpk_work: is it okay to pm you regarding gsoc proposal?
<hkaiser> gdaiss[m]: please loop me into this call, if possible - I'm very interested in getting involved
<diehlpk_work> Hashmi, we can discuss here in public
akheir has quit [Ping timeout: 256 seconds]
<Hashmi> Oh yes if its okay..
<diehlpk_work> So all can give their feedback
<Hashmi> Its regarding pip package for phylanx
<diehlpk_work> Sure
<Hashmi> So I have been researching about pypi packages
<hkaiser> simbergm: do my message make it through to matrix?
<Hashmi> I think current cmake build is possible to trigger through setup.py
<Hashmi> That is if phylanx starts passing the build yests
<Hashmi> Tests*
<Hashmi> I checkedout older stable commits but it doesnt build
<diehlpk_work> Hashmi, The challenge for this project to ship the dependencies
<gdaiss[m]> <hkaiser "ms: do my message make it throug"> hkaiser: Yes, at least I can them read them now. About the call: sure, we just need to decide on a conference call software! ms what did we use last time? was it Matrix for the call as well?
<diehlpk_work> Hashmi, What does not build?
<hkaiser> gdaiss[m]: zoom works well enough these days
<diehlpk_work> Can you post the error on pastebin or as a gist on github
<Hashmi> Yes.. if we package hpx, blaze and blaze tensor before we publish package for phylanx then it should work
<diehlpk_work> For phylanx building issues hkaiser or rtohid are good starting poitns
<Hashmi> Oh sure a min
<hkaiser> gdaiss[m]: both simbergm as well as I can schedule zoom meetings easily any time
parsaamini[m] has joined #ste||ar
karame78 has joined #ste||ar
<diehlpk_work> Hashmi, I do nit think we can package all packages for ubuntu and fedora
<gdaiss[m]> then let's use Zoom! I was about to suggest the DFN conference call software that we usually use in Stuttgart, but naturally that one stopped functioning on Monday when everybody started working from home
<diehlpk_work> Getting hpx into Fedora took several months
<Hashmi> @diehlpk_work: oh may I ask why?
<hkaiser> gdaiss[m]: ok, let me know if you need me to schedule things
<hkaiser> I think it even works ad hoc, if needed
<diehlpk_work> Getting a package to a major distribution is not easy. You have to write the spec file for Fedora, get it reviewed, and approved
parsaamini[m] is now known as parsa[m]
<diehlpk_work> Another thing is that the packages are not good with respect to performance, because they are not build on the hardware and optimization is missing
bita has joined #ste||ar
<diehlpk_work> Sicne we use phylanx on HPC systems it would be slow comapred to a native build
<diehlpk_work> Hashmi, I think a good starting point would be to build hpx and phylanx
<diehlpk_work> Have you done this?
<Hashmi> I have been able to build hpx and other dependencies
<Hashmi> Phylanx doesnt build
<Hashmi> Sorry error you asked for coming through
<diehlpk_work> Ok, next step would be to get phylanx to build
<simbergm> hkaiser: yep, I see the messages as well
<hkaiser> good, thanks
<Hashmi> This is using an older build
<simbergm> we can also combine the kokkos/hpx call with the usual hpx call
<Hashmi> With verified commits and build pass
akheir has joined #ste||ar
<simbergm> gdaiss: freifrau_von_bleifrei that would be at 4 (which is why I suggested 3 at first)
<jbjnr> gregordaiss[m]: ms[m] may I also join the kokkos call please?
<hkaiser> simbergm: not sure if we should do that - there is plenty of HPX specific (organizational) things we need to talk about
<diehlpk_work> Hashmi, I did not build phylanx since along time and someone else might can help you
<simbergm> or we start at 3:30 and just continue with the hpx call afterwards
<jbjnr> somethint weird happened to my names in the message above
<simbergm> yeah, that's true
<simbergm> jbjnr: yt? have you had time to look at the governance PR? should I go clean that up?
<hkaiser> simbergm : that means I have to be up at 8:30am ;-) I'll see what I can do
<Hashmi> @diehlpk_work: i am going to try docker build next just to see if it works. But for project I would need to use cmake.
<diehlpk_work> Hashmi, First work on getting phylanx to compile
<jbjnr> ms[m]: I'll look at the governance stuff tomorrow. Might need to be reminded. I keep forgetting
<hkaiser> jbjnr: we have our PMC call this week Thursday
<hkaiser> would be nice to have it done by then
<jbjnr> ok. then tomorrow I'd better look at it.
<diehlpk_work> Another step would be do investigate how to ship c++ dependencies in a pip package
<hkaiser> jaafar: much apprciated
<gdaiss[m]> @ms Well, 3.30 would work for me as well! Most other meetings and appointments around here are canceled anyway!
<simbergm> jbjnr: no problem, I'll remind you again tomorrow
<Hashmi> @diehlpk_work: what other c++ dependies? I know about pybind11
<diehlpk_work> blaze, hwloc, boost, jmalloc, and so on?
<hkaiser> jaafar: sorry - I did it again
<Hashmi> I will keep working on compiling phylanx
<jbjnr> when is the gsoc deadline?
akheir has quit [Read error: Connection reset by peer]
<jbjnr> (for applications I mean)
<Hashmi> diehlpk_work: i see! I
akheir has joined #ste||ar
<jbjnr> gregordaiss[m]: ms[m] trying agin to ask - may I also join the kokkos meeting please.
<jbjnr> it did it again
<Hashmi> @diehlpk_work: blaze will need to be shipped as a package itself. Boost has pip package. And i havent found a way to get hwloc, tcmalloc etc
<simbergm> hkaiser, freifrau_von_bleifrei , gdaiss all right, let's do 3 pm CET in case we need more time? should be 9 am in baton rouge I think? does that work for everyone?
<simbergm> also jbjnr ^
<hkaiser> simbergm: ahh yes, this week it's still 9am, fine for me
<diehlpk_work> hkaiser, Can you look into the plots Kevin sent around?
<simbergm> jbjnr: student application deadline is 31st I think
<hkaiser> and we roll over into the hpx stuff afterwards
<diehlpk_work> We need to decide which one of these should be added to the paper
<simbergm> yep
<hkaiser> diehlpk_work: doing it as we speak
<diehlpk_work> I like the subgrids per Joule
<simbergm> I still have to check if/how I can host a zoom meeting not being in the office, but I'll let you all know
<hkaiser> yah, even that's not too favourable ;-)
<diehlpk_work> simbergm, hkaiser can do that
<diehlpk_work> I was hosting a zoom meeting from home this morning
parsa|| has joined #ste||ar
parsa|| has quit [Client Quit]
<simbergm> diehlpk_work: all right, good
<diehlpk_work> hkaiser, We have to remind that we have not done scaling runs due to convergence test
<diehlpk_work> All the problems are too small
<simbergm> I want to check for my own purposes anyway
ahkeir1 has joined #ste||ar
akheir has quit [Read error: Connection reset by peer]
<diehlpk_work> For 64 nodes we have 20 subgrids per node
<jbjnr> are you huys talking about thursday? or Wednesday?
<diehlpk_work> Level 11 is even worse with 5 subgrids per node on 1024 nodes
<simbergm> jbjnr: thursday
<hkaiser> uhh, that's less subgrids than cores!
<simbergm> hpx call at 4, kokkos/hpx call at 3
<jbjnr> Thursday is a bank holiday here and although it makes no difference now that we're all home al the time ... I was planning on cycling through the woods
<diehlpk_work> Yes, this is the trade-off for the convergence test
<jbjnr> ok. So I'd best be back before 3pm
<diehlpk_work> We could not afford to run higher levels for such a long time
<jbjnr> gosh - this matrix is so slow. I send a message and it takes 30 seconds to apear
<simbergm> oooh right...
<diehlpk_work> jbjnr, same here. matrix irc bridge is slowww
<simbergm> were you planning on cycling all day?
<diehlpk_work> Level 12 has 100 subgrids on 1900 nodes
<gdaiss[m]> ms: I am not dead set on Thursday, especially if it's a holiday for you guys! Friday is fine by me too
<jbjnr> no, but normally I would do it in the afternoon rather than the morning. Let the sun warm the place up a bit before going out
<hkaiser> diehlpk_work: that does not make any sense
<hkaiser> 100 subgrids overall on 1900 nodes?
<diehlpk_work> No per node
<hkaiser> that's less subgrids than nodes!
<hkaiser> ahh
<hkaiser> how many cores do we have?
<hkaiser> per node
<diehlpk_work> 32
<jbjnr> thursday will be ok, worst case scenario, I don't join, but I'll try to be here
<hkaiser> 3 subgrids per core
<hkaiser> jbjnr: we would miss you!
<diehlpk_work> hkaiser, We have the same issue as last year, IO is so bad that we can not do really higher levels
<hkaiser> k
<diehlpk_work> Loading level 12 on 1024 nodes takes 2 hours
<jbjnr> you need the libfabric PP for regridding and IO :(
<hkaiser> diehlpk_work: well dominic said that he's working on the regridding tool
<diehlpk_work> Yes, this will help
<diehlpk_work> Without that tool I can not do any runs on summit
shahrzad has joined #ste||ar
<diehlpk_work> hkaiser, We should not talk about scaling at all in this paper
<jbjnr> The startup time for LF was reduced from hours to seconds when I ran last year. Such a shame it doesn't work any more. I wish I new what was broken
<diehlpk_work> That was the best we can do with the IO and node hours we had
<diehlpk_work> jbjnr, Ok, cool
<diehlpk_work> Yes libfabric would be nice
<diehlpk_work> jbjnr, I will try to get octiger running on summit next month
<diehlpk_work> we have 20k node hours for some sclaing runs
<diehlpk_work> Having libfabric there would be awesome
<diehlpk_work> hkaiser, I do not have any node second left on Cori and can not try to collect the new data
<diehlpk_work> Waiting for Alice to add me to the new project
<diehlpk_work> jbjnr, Would you have time to work on libfabric on Summit?
<jbjnr> maybe. Not really sure what I'll be doing. I'm on another project altogether now - putting libfabric into a different non-hpx code. Would like to work on the senders/receivers stuff. we need that for the iso c++ feedback. supposed to be helping with DCA++ but haven't managed to get stuck into that either.
<hkaiser> diehlpk_work: then leave out the 16 node run
<diehlpk_work> hkaiser, let us wait for more node hours
<hkaiser> k
<diehlpk_work> jbjnr, Ok, I will start with the MPI runs first
<jbjnr> why are you running octotiger on summit anyway - the SC deadline will be long past by next month won't it?
<diehlpk_work> jbjnr, For the Invite proposal
<diehlpk_work> Last year the complained that we never shown sclaing runs on Summit
<diehlpk_work> So this year we will add them to the incite proposal and they will find something new to reject us
<jbjnr> lack of any real scientific merit?
<diehlpk_work> We are working on that
<diehlpk_work> finger crossed
<diehlpk_work> Does anyone know how to connect to the LSU library to access paper behind a pay wall?
<heller1> vpn?
<hkaiser> diehlpk_work: yah, use vpn, then you'll have access
<parsa[m]> diehlpk_work: use ezproxy
<diehlpk_work> parsa[m], How do I do this
<parsa[m]> it will have you sign in with your LSU ID
<diehlpk_work> Ok, lol, the new solution from LSU does not work on Linux
<zao> :D
<diehlpk_work> There is no download of a linux client after login
<parsa[m]> diehlpk_work: you can also use this bookmarklet i made to automatically convert regular urls to lsu ezproxy urls: `javascript:window.location='http://libezp.lib.lsu.edu/login?url=';+document.location.href`
<zao> Heh, we use the same kind of system - `javascript:void(location.href='http://proxy.ub.umu.se/login?url=';+location.href)`
<parsa[m]> click on the bookmarklet from the journal page you want to open, it will re-open the page in lsu's ezproxy
<parsa[m]> zao: sure. it's good enough. isn't it?
<zao> Sometimes ours considers sites out of scope and refuses to even try, but otherwise fine.
<parsa[m]> for those just search the article's title in an ezproxied google scholar page. probably would open fine
<weilewei> hkaiser ah, I think I have found out the gpudirect error. So the purpose of my work is to send G2 matrix generated from each rank around mpi world. however, the DCA++ code use multiple threads, and each thread has one or more G2s at the same time. My mpi_isend stuff only handles one G2 per rank
<weilewei> so I need something like -> multi-threaded mpi_isend stuff
shahrzad has quit [Ping timeout: 246 seconds]
<jbjnr> use a tag to identify different G2's?
KatieB has joined #ste||ar
<weilewei> jbjnr ah, catch you! That's possible solution
<weilewei> jbjnr then I need to somehow insert/bind thread id info into the tag for different G2's
akheir1_ has joined #ste||ar
<jbjnr> you must have something you can use to identify the same G2 on different nodes I guess. If not you'll have to give them some kind of label based on the operation thatt's common across nodes.
ahkeir1 has quit [Ping timeout: 246 seconds]
<weilewei> jbjnr indeed
nikunj97 has quit [Remote host closed the connection]
nk__ has joined #ste||ar
nikunj97 has joined #ste||ar
KatieB has quit [Ping timeout: 240 seconds]
nk__ has quit [Ping timeout: 256 seconds]
ahkeir1 has joined #ste||ar
akheir1_ has quit [Ping timeout: 246 seconds]
ahkeir1 has quit [Quit: Leaving]
weilewei has quit [Remote host closed the connection]
nk__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 264 seconds]
<jaafar> diehlpk_work: as someone who recently dealt with C++ deps in a Python package, strongly suggest you look at conda instead of pip
<jaafar> If that's an option
nk__ has quit [Quit: Leaving]
<zao> ugh :D
<hkaiser> jaafar: doesn't this pull in a whole set of infrastructure libraries?
<zao> Not to mention a "compiler".
<jaafar> Well, the problem is dependencies
<jaafar> pip etc. doesn't really contemplate C++ dependencies at all
<jaafar> So if you use e.g. numpy and something else (graph_tool, in my case) and they use different versions of Boost, say
<jaafar> kaboom
<jaafar> conda, whatever its flaws, actually has some concept of the problem
<jaafar> maybe you have better control over your environment and it's not an issue
<zao> Which is why any serious HPC deployment uses modules, which has multiple side-by-side versions of things you need.
<zao> Instead of "whatever conda happens to have today, better hope they don't change something or have too old stuff".
<zao> s/serious/large-scale/
<jaafar> you can specify package versions
<jaafar> pip AFAIK is like "what is C++"
<zao> We've got a lot of untangling to do whenever an user comes with a conda-based workflow that "works on their laptop".
<jaafar> what are dynamic libs
<jaafar> etc.
<jaafar> Maybe it's not an issue for you all
<jaafar> Imagine ODR violations, for example
<zao> pip (on pypi) has problems in that it doesn't understand dynamic libraries outside of manylinux{1,2010} wheels.
<jaafar> yep
<zao> conda attempts to solve it in a distro-like way, but you're going to have a mess of environments trying to get it right, and they're quite hard to lift between systems.
<jaafar> It is difficult.
<zao> And the binaries it ships are often so out of phase with what the OS has that they tend to break system stuff.
<jaafar> Ah!
<zao> And most importantly, it interfaces _very_ poorly with site-specific MPI deployments.
<jaafar> So, I'm not talking about just grabbing stuff from online
<jaafar> I'm suggesting using conda-build
<jaafar> so you control the binary versions alongside the Python code
<zao> If someone comes to us with something using conda, multi-node codes are pretty much completely off the table as the MPI will be unusable.
<jaafar> Why would that be?
<zao> So conda-build lets you do what, make something that's inbetween a miniconda and a full honking conda env?
<zao> It's going to have a whole bunch of fundamental libraries that are diverging in versions from what the OS has, to the extent that regular tools you run don't work.
<zao> So... MPI.
<zao> On our site, in order for OpenMPI to work in batch jobs it needs to be built against PMIx, PMI2, UCX, and SLURM.
<zao> Otherwise it's quite likely that srun will have issues, or not work much at all.
<jaafar> Can you specify architecture and package version for those deps?
<zao> The OS's OpenMPI is pretty much unusable, already, let alone something shipping with conda.
<jaafar> For conda-build you can lock them
<zao> It's not just versions, you need the same shared build of them.
<jaafar> I'm confused about "shipping with conda" I guess
<zao> Whatever ends up in your conda world when building a conda env, however you do it, will not be the versions we have installed site-wide which needs to be used.
<zao> CUDA needs to be installed with awareness of your MPI implementation.
<jaafar> But can't you just specify your system's implementation and version?
<zao> It's not just "OpenMPI 4.0.1". It's "OpenMPI 4.0.1 built with UCX 1.8.0, PMIx x.y.z and SLURM libraries in location <x>".
<zao> It's a very intricate web of runtimes that need to be linked up for the batch to work well.
<jaafar> When you say "location <x>" do you mean "path on my system"?
<zao> Yes.
<jaafar> And there's no way to reproduce that independently?
<zao> Shared parallel file system with those pre-built, and our modules for OpenMPI etc. reference those when built.
<zao> If you have so granular control over however you build an OpenMPI to stick into a conda environment, sure, it'd work.
<jaafar> Yeah I think that's sort of what I had in mind
<zao> My impression of conda is that you pull qute stock binaries from some upstream forge.
<jaafar> Maybe it's not workable
<jaafar> ah yes so I'm suggesting you make a forge, basically :)
<jaafar> The conda advantage is mainly that it understands binary package versions too
<jaafar> so you can specify both dependencies
<jaafar> In unusual cases (like yours) you might have to make your own forge with those deps
<jaafar> but at least it's represented
<zao> So say one sets up a forge for a particular site's idea of libraries, so you get all the deps duplicated and right, you still can't share that with anyone.
<jaafar> and binary version conflicts are noted
<zao> (anyone on any other system)
<jaafar> you can make a public forge
<zao> Share as in that it won't be usable.
<zao> The setup with UCX and PMIx and everything? That's how my site does it. Another set of filthy hacks is likely in place on the next site over.
<jaafar> I'm sure I don't really understand your constraints :)
<zao> For conda to be meaningfully used outside of single-node jobs that have no dependencies on anything external, it needs to mesh in with whatever modules are in place on the site already.
<jaafar> OK, so just recompile on install? I guess that works
<jaafar> binary deps are just what's on the system
<zao> I have yet to see any software packaged with conda to declare anything about how it's built.
<jaafar> really
<zao> It's just "point at this forge and install", where the non-conda instructions are rotted and there's rarely any of whatever recipes you use to build a conda forge.
<zao> At least from the bio junk I've seen.
<zao> requirements.txt is pretty much always incomplete or out of date, somehow.
* jaafar looks at dependencies
karame78 has quit [Remote host closed the connection]
<zao> Now, if you've got a ste||ar hat on, sure, there might be some value in providing phylanx or HPX in a forge, but it's most probably not overly usable at scale.
<jaafar> So these all have the OS, architecture, and Python version embedded in their names
<jaafar> If you unpack one of them the index.json file lists dependencies
<jaafar> some of which are like "vs2015_runtime" :)
<jaafar> AFAICT requirements.txt is not a thing Conda uses but I suspect they generate one, or can?
<jaafar> But I've probably belabored this
<jaafar> And your situation is not mine
<jaafar> It was a nice writeup and persuaded me to invest in conda
<zao> No-one in the conda world had any interest in understanding the problems outside of the "scientist can install this on their laptop" scenario that comes with large-scale computing.
<zao> So excuse me if I'm quite sour on this, but I'm quite sour on this.
<zao> And believe me, we've tried.
<jaafar> I believe you! Thanks for listening
<zao> There's software that we've plain not been able to install site-wide as it would be too costly to reverse-engineer how it'd be deployed.
<zao> Our current setup has modules for libraries, Python, and common tricky packages that require source builds, and users may use virtual environments to install the remainder with pip.
<zao> A bit janky, but the least horrible we can manage.
<jaafar> packaging and installation is the worst, I sympathize
<zao> Compute Canada does something similar but has a wheelhouse with binary wheels for multiple microarchitectures.
<zao> Kind of cool, but is a bit tied to their underlying OS.
shahrzad has joined #ste||ar
gonidelis has joined #ste||ar
<diehlpk_work> jaafar, sounds interesting. However, our GSoC applicants would have to work on that
<diehlpk_work> nan111, Did the GSoC student ever come back to you?
<diehlpk_work> bita, What about you?
avah has joined #ste||ar
avah has quit [Ping timeout: 240 seconds]
shahrzad has quit [Ping timeout: 246 seconds]
nan111 has quit [Ping timeout: 240 seconds]
<gonidelis> As I have left aside the API familiarization (I am not going to be an end-user after all...I want to develop the stuff! :D) I have started reading some of your libraries and particularly your range based implemented algorithms as I am going to imitate the way they are written for the 'Range based Parallel Algorithms' project. A question that pops
<gonidelis> up as I am reading your libs, is: "In the comment section what is the use of '\a' [backslash + 'a']?"
<heller1> Against which API do you want to develop?
<gonidelis> What do you mean?
<heller1> What's the project you want to do?
<heller1> As in: how familiar are you with the APIs you're supposed to with with over the summer?
<gonidelis> I would like to contribute on implementing missing parallel algorithms from here (https://github.com/STEllAR-GROUP/hpx/issues/1141) but for the moment I am trying
<gonidelis> to contribute on the range based ones (https://github.com/STEllAR-GROUP/hpx/issues/1668) in order to form a stronger proposal
<gonidelis> As for the familiarization the last couple of weeks I was studying the basics of HPX but I need to dig in a little more specifically as proposal deadline is on two weeks
<zao> Which piece of documentation or code are you talking about up there when you say "comment section"?
<gonidelis> Well I happened to read the header all_any_none.hpp
<zao> The triple /// mean it's a documentation comment.
<zao> \a is some sort of markup in that documentation language.
<gonidelis> Are these comments exposed somewhere else (maybe in some markup file) besides the source code?
<zao> The documentation is generated from standalone files together with the contents of the doc-comments.
maxwellr96 has joined #ste||ar
<zao> This is built with Sphinx.
<zao> (also doxygen, it seems)
<gonidelis> Thank you so much! I didn't know that documentation could be written through the main code. I understand what is the point know... (wow! I am learning so much...!)
<gonidelis> Where could I find how do the tests work? (in the case of all_off.cpp for example) What is their purpose and when and how are they executed? Thank you,
gonidelis has quit [Ping timeout: 240 seconds]
diehlpk_work has quit [Remote host closed the connection]
weilewei has joined #ste||ar