hkaiser, Can you add sth for the performance counters?
sure, what?
Can you add in the description a remark why we need them
Sth like measuring performance within HPX is not so easy and we need our one performance counters
I can try, would tomorrow be ok?
So we avoid any stupid reviwer to ask why not use any existing solution
Sure tomorrow is sufficient
Just read your text this evening, took some time to setup my computer at homew
And can you reference the listing with the application performance counters and elaborate on them as well?
Hello All, as a part of my GSoC proposal I am planning to measure timing improvements on distributed system. However, I dont have access to any cluster. Is it possible to get reasonaby accurate results without having a cluster set up? Thanks
Pranavug: You should probably wait for an answer from someone formally involved with HPX, but I believe they still have a small development cluster.
Zao : Thank you. Sure.
I'm curious, which project in the list is this?
I'm planning to work on "Domain decomposition and load balancing for crack and fracture mechanics"
It may not seem to be necessary, however I thought to showcase the timing results to illustrate the improvement as well
Hard to demonstrate an improvement without measuring the improvement :D
You've got the mentor here on IRC as diehlpk, btw.
Thanks, Yes I am aware. I was hoping a few contributors would have faced such situation recently though
Pranavug: in the past we have been able to give some gsoc students access to larger machines/clusters
jbjnr : OK Thanks for letting me know.
Pranavug: ping
simbergm : You mentioned about simple HPX code which I have to submit
To whom and till when do I submit it?
about the small program, I suggest you put it on github and put a link in your application
we've typically had students submit their applications as google documents
Sure, will do so
feel free to ask questions about building and understanding HPX, but we just want to make sure that applicants don't start from zero when the actual project begins
Hello, everyone. Can someone help me with cmake install error for hpx?
Can you run the script there on summit and send me the output?
We like to have a most similar configuration of our summit nodes here at CCT
Ok, let me try how to run it
Just checkout the repo and submit a job on one of the compute nodes
and send me the pbs output
Sai will have a look and will try to install similar stuff
diehlpk_work: meeting at 11 today? (I'm still confused as the email says 11CST, which would be 10am)...
hkaiser, Are we not in CST?
we're currently in CDT (Central Daylight-saving Time)
CST == Central Standard Time
Ok, I missed that
I will send an email
have you seen my pm?
so it's 11am today?
Apex results look strange
diehlpk_work sent the result file to your telegram
Thanks, I will forward it to Sai
I will start to work on Summit after the SC paper
iti has quit [Ping timeout: 246 seconds]
ping to the irc side
ping (as well
gonidelis has quit [Remote host closed the connection]
KatieB has quit [Ping timeout: 240 seconds]
gdaiss[m] has joined #ste||ar
ms: , freifrau_von_bleifrei So I guess we are moving to the HPX channel now for all Kokkos-related stuff! Speaking of which: Do you two want to do another call soon to sync up our current progress with the executors and allocators? It's been a while since we've had one
gdaiss: freifrau_von_bleifrei : yep, let's do that this week
thursday is meeting day for me anyway so maybe then? at 3? other days and times work too...
Thursday at 3 works for me!
@diehlpk_work: is it okay to pm you regarding gsoc proposal?
gdaiss[m]: please loop me into this call, if possible - I'm very interested in getting involved
Hashmi, we can discuss here in public
Oh yes if its okay..
So all can give their feedback
Its regarding pip package for phylanx
So I have been researching about pypi packages
simbergm: do my message make it through to matrix?
I think current cmake build is possible to trigger through setup.py
That is if phylanx starts passing the build yests
I checkedout older stable commits but it doesnt build
Hashmi, The challenge for this project to ship the dependencies
<hkaiser "ms: do my message make it throug"> hkaiser: Yes, at least I can them read them now. About the call: sure, we just need to decide on a conference call software! ms what did we use last time? was it Matrix for the call as well?
Hashmi, What does not build?
gdaiss[m]: zoom works well enough these days
Can you post the error on pastebin or as a gist on github
Yes.. if we package hpx, blaze and blaze tensor before we publish package for phylanx then it should work
For phylanx building issues hkaiser or rtohid are good starting poitns
Oh sure a min
gdaiss[m]: both simbergm as well as I can schedule zoom meetings easily any time
Hashmi, I do nit think we can package all packages for ubuntu and fedora
then let's use Zoom! I was about to suggest the DFN conference call software that we usually use in Stuttgart, but naturally that one stopped functioning on Monday when everybody started working from home
Getting hpx into Fedora took several months
@diehlpk_work: oh may I ask why?
gdaiss[m]: ok, let me know if you need me to schedule things
I think it even works ad hoc, if needed
Getting a package to a major distribution is not easy. You have to write the spec file for Fedora, get it reviewed, and approved
Another thing is that the packages are not good with respect to performance, because they are not build on the hardware and optimization is missing
Sicne we use phylanx on HPC systems it would be slow comapred to a native build
Hashmi, I think a good starting point would be to build hpx and phylanx
Have you done this?
I have been able to build hpx and other dependencies
Phylanx doesnt build
Sorry error you asked for coming through
Ok, next step would be to get phylanx to build
hkaiser: yep, I see the messages as well
good, thanks
This is using an older build
we can also combine the kokkos/hpx call with the usual hpx call
With verified commits and build pass
gdaiss: freifrau_von_bleifrei that would be at 4 (which is why I suggested 3 at first)
gregordaiss[m]: ms[m] may I also join the kokkos call please?
simbergm: not sure if we should do that - there is plenty of HPX specific (organizational) things we need to talk about
Hashmi, I did not build phylanx since along time and someone else might can help you
or we start at 3:30 and just continue with the hpx call afterwards
somethint weird happened to my names in the message above
yeah, that's true
jbjnr: yt? have you had time to look at the governance PR? should I go clean that up?
simbergm : that means I have to be up at 8:30am ;-) I'll see what I can do
@diehlpk_work: i am going to try docker build next just to see if it works. But for project I would need to use cmake.
Hashmi, First work on getting phylanx to compile
ms[m]: I'll look at the governance stuff tomorrow. Might need to be reminded. I keep forgetting
jbjnr: we have our PMC call this week Thursday
would be nice to have it done by then
ok. then tomorrow I'd better look at it.
Another step would be do investigate how to ship c++ dependencies in a pip package
jaafar: much apprciated
@ms Well, 3.30 would work for me as well! Most other meetings and appointments around here are canceled anyway!
jbjnr: no problem, I'll remind you again tomorrow
@diehlpk_work: what other c++ dependies? I know about pybind11
blaze, hwloc, boost, jmalloc, and so on?
jaafar: sorry - I did it again
I will keep working on compiling phylanx
when is the gsoc deadline?
diehlpk_work: i see! I
gregordaiss[m]: ms[m] trying agin to ask - may I also join the kokkos meeting please.
it did it again
@diehlpk_work: blaze will need to be shipped as a package itself. Boost has pip package. And i havent found a way to get hwloc, tcmalloc etc
hkaiser, freifrau_von_bleifrei , gdaiss all right, let's do 3 pm CET in case we need more time? should be 9 am in baton rouge I think? does that work for everyone?
also jbjnr ^
simbergm: ahh yes, this week it's still 9am, fine for me
hkaiser, Can you look into the plots Kevin sent around?
jbjnr: student application deadline is 31st I think
and we roll over into the hpx stuff afterwards
We need to decide which one of these should be added to the paper
diehlpk_work: doing it as we speak
I like the subgrids per Joule
I still have to check if/how I can host a zoom meeting not being in the office, but I'll let you all know
yah, even that's not too favourable ;-)
simbergm, hkaiser can do that
I was hosting a zoom meeting from home this morning
diehlpk_work: all right, good
hkaiser, We have to remind that we have not done scaling runs due to convergence test
All the problems are too small
I want to check for my own purposes anyway
For 64 nodes we have 20 subgrids per node
are you huys talking about thursday? or Wednesday?
Level 11 is even worse with 5 subgrids per node on 1024 nodes
jbjnr: thursday
uhh, that's less subgrids than cores!
hpx call at 4, kokkos/hpx call at 3
Thursday is a bank holiday here and although it makes no difference now that we're all home al the time ... I was planning on cycling through the woods
Yes, this is the trade-off for the convergence test
ok. So I'd best be back before 3pm
We could not afford to run higher levels for such a long time
gosh - this matrix is so slow. I send a message and it takes 30 seconds to apear
oooh right...
jbjnr, same here. matrix irc bridge is slowww
were you planning on cycling all day?
Level 12 has 100 subgrids on 1900 nodes
ms: I am not dead set on Thursday, especially if it's a holiday for you guys! Friday is fine by me too
no, but normally I would do it in the afternoon rather than the morning. Let the sun warm the place up a bit before going out
diehlpk_work: that does not make any sense
100 subgrids overall on 1900 nodes?
No per node
that's less subgrids than nodes!
how many cores do we have?
per node
thursday will be ok, worst case scenario, I don't join, but I'll try to be here
3 subgrids per core
jbjnr: we would miss you!
hkaiser, We have the same issue as last year, IO is so bad that we can not do really higher levels
Loading level 12 on 1024 nodes takes 2 hours
you need the libfabric PP for regridding and IO :(
diehlpk_work: well dominic said that he's working on the regridding tool
Yes, this will help
Without that tool I can not do any runs on summit
hkaiser, We should not talk about scaling at all in this paper
The startup time for LF was reduced from hours to seconds when I ran last year. Such a shame it doesn't work any more. I wish I new what was broken
That was the best we can do with the IO and node hours we had
jbjnr, Ok, cool
Yes libfabric would be nice
jbjnr, I will try to get octiger running on summit next month
we have 20k node hours for some sclaing runs
Having libfabric there would be awesome
hkaiser, I do not have any node second left on Cori and can not try to collect the new data
Waiting for Alice to add me to the new project
jbjnr, Would you have time to work on libfabric on Summit?
maybe. Not really sure what I'll be doing. I'm on another project altogether now - putting libfabric into a different non-hpx code. Would like to work on the senders/receivers stuff. we need that for the iso c++ feedback. supposed to be helping with DCA++ but haven't managed to get stuck into that either.
diehlpk_work: then leave out the 16 node run
hkaiser, let us wait for more node hours
jbjnr, Ok, I will start with the MPI runs first
why are you running octotiger on summit anyway - the SC deadline will be long past by next month won't it?
jbjnr, For the Invite proposal
Last year the complained that we never shown sclaing runs on Summit
So this year we will add them to the incite proposal and they will find something new to reject us
lack of any real scientific merit?
We are working on that
finger crossed
Does anyone know how to connect to the LSU library to access paper behind a pay wall?
diehlpk_work: yah, use vpn, then you'll have access
it will have you sign in with your LSU ID
Ok, lol, the new solution from LSU does not work on Linux
There is no download of a linux client after login
diehlpk_work: you can also use this bookmarklet i made to automatically convert regular urls to lsu ezproxy urls: `javascript:window.location='http://libezp.lib.lsu.edu/login?url=';+document.location.href`
click on the bookmarklet from the journal page you want to open, it will re-open the page in lsu's ezproxy
zao: sure. it's good enough. isn't it?
Sometimes ours considers sites out of scope and refuses to even try, but otherwise fine.
for those just search the article's title in an ezproxied google scholar page. probably would open fine
hkaiser ah, I think I have found out the gpudirect error. So the purpose of my work is to send G2 matrix generated from each rank around mpi world. however, the DCA++ code use multiple threads, and each thread has one or more G2s at the same time. My mpi_isend stuff only handles one G2 per rank
so I need something like -> multi-threaded mpi_isend stuff
use a tag to identify different G2's?
jbjnr ah, catch you! That's possible solution
jbjnr then I need to somehow insert/bind thread id info into the tag for different G2's
you must have something you can use to identify the same G2 on different nodes I guess. If not you'll have to give them some kind of label based on the operation thatt's common across nodes.
jbjnr indeed
weilewei has quit [Remote host closed the connection]
diehlpk_work: as someone who recently dealt with C++ deps in a Python package, strongly suggest you look at conda instead of pip
If that's an option
ugh :D
jaafar: doesn't this pull in a whole set of infrastructure libraries?
Not to mention a "compiler".
Well, the problem is dependencies
pip etc. doesn't really contemplate C++ dependencies at all
So if you use e.g. numpy and something else (graph_tool, in my case) and they use different versions of Boost, say
conda, whatever its flaws, actually has some concept of the problem
maybe you have better control over your environment and it's not an issue
Which is why any serious HPC deployment uses modules, which has multiple side-by-side versions of things you need.
Instead of "whatever conda happens to have today, better hope they don't change something or have too old stuff".
you can specify package versions
pip AFAIK is like "what is C++"
We've got a lot of untangling to do whenever an user comes with a conda-based workflow that "works on their laptop".
what are dynamic libs
Maybe it's not an issue for you all
Imagine ODR violations, for example
pip (on pypi) has problems in that it doesn't understand dynamic libraries outside of manylinux{1,2010} wheels.
conda attempts to solve it in a distro-like way, but you're going to have a mess of environments trying to get it right, and they're quite hard to lift between systems.
It is difficult.
And the binaries it ships are often so out of phase with what the OS has that they tend to break system stuff.
And most importantly, it interfaces _very_ poorly with site-specific MPI deployments.
So, I'm not talking about just grabbing stuff from online
I'm suggesting using conda-build
so you control the binary versions alongside the Python code
If someone comes to us with something using conda, multi-node codes are pretty much completely off the table as the MPI will be unusable.
Why would that be?
So conda-build lets you do what, make something that's inbetween a miniconda and a full honking conda env?
It's going to have a whole bunch of fundamental libraries that are diverging in versions from what the OS has, to the extent that regular tools you run don't work.
So... MPI.
On our site, in order for OpenMPI to work in batch jobs it needs to be built against PMIx, PMI2, UCX, and SLURM.
Otherwise it's quite likely that srun will have issues, or not work much at all.
Can you specify architecture and package version for those deps?
The OS's OpenMPI is pretty much unusable, already, let alone something shipping with conda.
For conda-build you can lock them
It's not just versions, you need the same shared build of them.
I'm confused about "shipping with conda" I guess
Whatever ends up in your conda world when building a conda env, however you do it, will not be the versions we have installed site-wide which needs to be used.
CUDA needs to be installed with awareness of your MPI implementation.
But can't you just specify your system's implementation and version?
It's not just "OpenMPI 4.0.1". It's "OpenMPI 4.0.1 built with UCX 1.8.0, PMIx x.y.z and SLURM libraries in location <x>".
It's a very intricate web of runtimes that need to be linked up for the batch to work well.
When you say "location <x>" do you mean "path on my system"?
And there's no way to reproduce that independently?
Shared parallel file system with those pre-built, and our modules for OpenMPI etc. reference those when built.
If you have so granular control over however you build an OpenMPI to stick into a conda environment, sure, it'd work.
Yeah I think that's sort of what I had in mind
My impression of conda is that you pull qute stock binaries from some upstream forge.
Maybe it's not workable
ah yes so I'm suggesting you make a forge, basically :)
The conda advantage is mainly that it understands binary package versions too
so you can specify both dependencies
In unusual cases (like yours) you might have to make your own forge with those deps
but at least it's represented
So say one sets up a forge for a particular site's idea of libraries, so you get all the deps duplicated and right, you still can't share that with anyone.
and binary version conflicts are noted
(anyone on any other system)
you can make a public forge
Share as in that it won't be usable.
The setup with UCX and PMIx and everything? That's how my site does it. Another set of filthy hacks is likely in place on the next site over.
I'm sure I don't really understand your constraints :)
For conda to be meaningfully used outside of single-node jobs that have no dependencies on anything external, it needs to mesh in with whatever modules are in place on the site already.
OK, so just recompile on install? I guess that works
binary deps are just what's on the system
I have yet to see any software packaged with conda to declare anything about how it's built.
It's just "point at this forge and install", where the non-conda instructions are rotted and there's rarely any of whatever recipes you use to build a conda forge.
At least from the bio junk I've seen.
requirements.txt is pretty much always incomplete or out of date, somehow.
* jaafar
looks at dependencies
Now, if you've got a ste||ar hat on, sure, there might be some value in providing phylanx or HPX in a forge, but it's most probably not overly usable at scale.
It was a nice writeup and persuaded me to invest in conda
No-one in the conda world had any interest in understanding the problems outside of the "scientist can install this on their laptop" scenario that comes with large-scale computing.
So excuse me if I'm quite sour on this, but I'm quite sour on this.
And believe me, we've tried.
I believe you! Thanks for listening
There's software that we've plain not been able to install site-wide as it would be too costly to reverse-engineer how it'd be deployed.
Our current setup has modules for libraries, Python, and common tricky packages that require source builds, and users may use virtual environments to install the remainder with pip.
A bit janky, but the least horrible we can manage.
packaging and installation is the worst, I sympathize
Compute Canada does something similar but has a wheelhouse with binary wheels for multiple microarchitectures.
Kind of cool, but is a bit tied to their underlying OS.
jaafar, sounds interesting. However, our GSoC applicants would have to work on that
nan111, Did the GSoC student ever come back to you?
bita, What about you?
As I have left aside the API familiarization (I am not going to be an end-user after all...I want to develop the stuff! :D) I have started reading some of your libraries and particularly your range based implemented algorithms as I am going to imitate the way they are written for the 'Range based Parallel Algorithms' project. A question that pops
up as I am reading your libs, is: "In the comment section what is the use of '\a' [backslash + 'a']?"
Against which API do you want to develop?
What do you mean?
What's the project you want to do?
As in: how familiar are you with the APIs you're supposed to with with over the summer?
As for the familiarization the last couple of weeks I was studying the basics of HPX but I need to dig in a little more specifically as proposal deadline is on two weeks
Which piece of documentation or code are you talking about up there when you say "comment section"?
Well I happened to read the header all_any_none.hpp
The triple /// mean it's a documentation comment.
\a is some sort of markup in that documentation language.
Are these comments exposed somewhere else (maybe in some markup file) besides the source code?
The documentation is generated from standalone files together with the contents of the doc-comments.
Thank you so much! I didn't know that documentation could be written through the main code. I understand what is the point know... (wow! I am learning so much...!)
Where could I find how do the tests work? (in the case of all_off.cpp for example) What is their purpose and when and how are they executed? Thank you,
