<nikunj97>
hkaiser, is rostam under maintenance? I'm getting connection refused right now
<hkaiser>
nikunj97: not that I know of, as soon as Alireza is around I'll poke him
<nikunj97>
ohh alright, let me check if I'm doing something wrong
<hkaiser>
nikunj97: could you send an email to Alireza, cc'ing myself?
<diehlpk_mobile[m>
Nikunj97 Are you using rostam2?
<diehlpk_mobile[m>
Do you have an account there?
<hkaiser>
ahh, right - that could be it
<diehlpk_mobile[m>
The previous rostam is down
<nikunj97>
diehlpk_mobile[m, we have a rostam2 now?
<nikunj97>
that's why I can't connect to it
<diehlpk_mobile[m>
Yes, solely.
<nikunj97>
I need an account on rostam2
<hkaiser>
talk to Ali
<nikunj97>
I'll shoot him an email
<diehlpk_mobile[m>
The old one had a grave period, but since last week it is off
<hkaiser>
he is on telegram as well, just poke him
<nikunj97>
hkaiser, alright!
<nikunj97>
what's the specification of new one?
<hkaiser>
same machine, essentially, just a new name and new config
<nikunj97>
nice!
<nikunj97>
hkaiser, btw did I lose all the data on rostam as well?
<nikunj97>
it had my resiliency runs stored in it
<hkaiser>
nikunj97: no, should be still accessible
<nikunj97>
not sure if I stored them anywhere else
<nikunj97>
hkaiser, great!
<hkaiser>
the old home dirs are mounted to somewhere, don't remember the base name, though
<nikunj97>
alright, thanks!
kale_ has joined #ste||ar
__kale has joined #ste||ar
<__kale>
Hey, diehlpk_mobile[m I researched further on conda vs pip and I found that creating a pip package as of now would be better. People who use only pip and vitualenv, as well as those who use both conda and pip, can use a pip package. For users who prefer conda for package installation, conda has a functionality where conda can change the metadata (and in turn the skeleton) of a pip package so that it can be installed using conda.
<__kale>
The only problem will be while updating the package will not be updated. For that, we can change the pip package to conda on our side and update it on conda package repo so it'll solve the issue of update as well.
kale_ has quit [Ping timeout: 256 seconds]
<zao>
nikunj97: Backups are things that happen to other people :D
<nikunj97>
zao, xD
<zao>
We've got the /pfs/nobackup and /proj/nobackup filesystems. Guess what some of our support questions are about?
<nikunj97>
how do you guys manage backups then?
<zao>
Anything that is declared to have backups have disk+tape backups.
__kale has left #ste||ar ["Leaving"]
__kale has joined #ste||ar
<zao>
Small AFS homedir is backed up, as is anything infrastructure.
<nikunj97>
people still use tapes these days!
<zao>
It's not feasible to backup 2PB of constantly changing research data.
* nikunj97
shaking my head
<nikunj97>
makes sense though
<zao>
Users _should_ be aware of that their working set of active data should be backed elsewhere if costly to regenerate or precious.
<hkaiser>
... says the sysadmin ...
<zao>
nikunj97: We have a tape robot in the basement of something like 2500 tapes, with cross-site backups with other universities.
<zao>
hkaiser: It's super nice now that we have deployed dedicated storage projects for compute project PIs. Gives data an actual lifetime and requires a Data Management Plan.
<zao>
In the past they just hoarded as much data as they could forever :D
<hkaiser>
%> kill -9 foo -- the system responds: 'insert tape'
<nikunj97>
zao, I'm interested in knowing how much data does 2500 tapes store?
<zao>
15TB uncompressed per tape.
<zao>
360 MiB/s throughput, or so our website says.
<nikunj97>
that's a crap ton of tape storage
<nikunj97>
isn't it more efficient to use hard disks instead?
<nikunj97>
both in price and performance
<zao>
We offer backup to the departments, and we're also part of a T1 site for LHC, so a ton of ATLAS/ALICE experiment data.
<zao>
Disks need to be in machines that consume power and they tend to fail over time.
<nikunj97>
so there are actual reasons to keep tapes alive these days, impressive!
<zao>
It's a great tier behind more rapid-access technologies like disk.
weilewei has joined #ste||ar
<zao>
Not as much either-or, but as complements to each other.
<nikunj97>
i see, btw what's ATLAS and ALICE?
<zao>
We also offer national storage for researchers as well, still not backup, but for larger datasets they need to share - up to 100T or so.
<hkaiser>
simbergm, freifrau_von_ble, nikunj97, gdaiss[m]: just as a reminder, here is the link for the kokkos meeting: https://lsu.zoom.us/j/3340410194
<zao>
ATLAS and ALICE are experiments at the Large Hadron Collider at CERN.
<nikunj97>
you work at CERN?
<zao>
Nope.
<nikunj97>
hkaiser, thanks for the link
<nikunj97>
zao, I see where you're going
<zao>
They have compute and storage in tiers. CERN itself is T0, then there's a bunch of T1 sites around the world, and T2 and T3 below that where actual researchers sit and compute.
<nikunj97>
aah, so yours is a T1 site
<zao>
We're the Nordic T1, which has Sweden, Denmark, Norway, Finland, and ... Slovenia.
<nikunj97>
nice
<K-ballo>
slovenia? very nordic
<nikunj97>
zao, you're from Sweden right?
<zao>
K-ballo: Politics work in mysterious ways :D
<zao>
They're great at finding infrastructure that fails miserably once you have latencies and slow networks.
nikunj97 has quit [Remote host closed the connection]
diehlpk_work has joined #ste||ar
nikunj97 has joined #ste||ar
weilewei has quit [Ping timeout: 240 seconds]
<heller1>
hkaiser: ms: HPX meeting later today, right?
<hkaiser>
heller1: yah, 16:00 you rtime
<heller1>
cool. it's way easier for me to join now ;)
weilewei has joined #ste||ar
__kale has quit [Quit: Leaving]
__kale has joined #ste||ar
<diehlpk_work>
hkaiser, I was able to rerun all the bad data points this night
<diehlpk_work>
So we have new results for all of them
<hkaiser>
nice
<diehlpk_work>
However, I still would need 278100.0 NERSC hours to finish the large node runs
akheir has joined #ste||ar
<diehlpk_work>
If all runs are good, I would need 92700.0
akheir has quit [Read error: Connection reset by peer]
<diehlpk_work>
But I asked Alice for three runs per nodes, so we could tackle bad data or crashed
<diehlpk_work>
jobs
akheir has joined #ste||ar
<diehlpk_work>
parsa, Any news on the plots?
Abhishek09 has joined #ste||ar
simbergm has joined #ste||ar
akheir has quit [Read error: Connection reset by peer]
akheir has quit [Read error: Connection reset by peer]
<weilewei>
So the HPX meeting happens 20 mins later at 10 AM Central Time? Using the same link as Kokkos?
<simbergm>
currently it's really internal-fences-and-hpx-backend-updates but will be separated into separate branches at some point
<simbergm>
weilewei: yep
akheir has joined #ste||ar
<weilewei>
ok
<__kale>
diehlpk_work, What are your thoughts on my findings?
<diehlpk_work>
kale, had no time to look into it
akheir has quit [Read error: Connection reset by peer]
<diehlpk_work>
I will have time later today
akheir has joined #ste||ar
__kale has quit [Quit: Leaving]
rtohid has joined #ste||ar
<simbergm>
jbjnr: btw, if you have grander ideas on what the kokkos integration should look like we should probably have a call just for discussing design ideas (and long-term goals, i.e. what do we want this to look like in 5 years)
<freifrau_von_ble>
ms: thx!
akheir has quit [Read error: Connection reset by peer]
akheir has joined #ste||ar
<hkaiser>
simbergm: I'd support that
<diehlpk_work>
Lol, the German government asked Netflix to lower resolution, so people still can work and do not avoid people from working with having low bandwidth
<diehlpk_work>
My provider doubled my speed for free
<zao>
Swedish research council bought a "few" more Zoom licenses now. Went from 20k to 120k :D
Abhishek09 has joined #ste||ar
akheir has quit [Read error: Connection reset by peer]
<Abhishek09>
rtohid: Are u here?
akheir has joined #ste||ar
<rtohid>
Abhishek09: yes, here
<Abhishek09>
rtohid: How we implement C++ library , manual or auto by dnf?
<nikunj97>
Abhishek09, what do you mean implement?
<nikunj97>
do you mean build?
<rtohid>
and also which library?
<Abhishek09>
C++ lib- jemlloc, gcc, and many more
<Abhishek09>
boost
<Abhishek09>
pybind
<Abhishek09>
nikunj97: we need C++ library as deps for building/installing phylanx
<rtohid>
I build blaze, blaze_tensor hpx phylanx pybind11
<rtohid>
rest through os packages
<nikunj97>
Abhishek09, as rtohid said, it's usually building our own packages only
<nikunj97>
for the dependencies we rely more on os packages
<nikunj97>
but you can certainly build all dependencies yourself. Building gcc though would be far fetched wrt pip package project
K-ballo has quit [Quit: K-ballo]
<nikunj97>
most of os have all dependencies available as a package. So for a basic phylanx/hpx build, you can simply install dependencies through the package manager.
<rtohid>
well, youcan skip compiler and a couple of other dependencies. As diehlpk_work suggested, it's the best if you start from building HPX and worry about the rest later on
<nikunj97>
Abhishek09, did you build phylanx btw?
<nikunj97>
iirc diehlpk_work asked you to do so
<Abhishek09>
Yes, on mac
<rtohid>
are all the test pass on MAc?
bita has joined #ste||ar
Abhishek09 has quit [Remote host closed the connection]
<diehlpk_work>
parsa, Can you send me the python code for the AGAS stuff?
<diehlpk_work>
I will have some time to look into it later today
K-ballo has joined #ste||ar
avah has joined #ste||ar
Abhishek09 has joined #ste||ar
avah has quit [Remote host closed the connection]
<Abhishek09>
Sorry, my network has lost
<Abhishek09>
rtohid: lets continue discussion
K-ballo has quit [Remote host closed the connection]
K-ballo has joined #ste||ar
<rtohid>
Abhishek09 sure
<Abhishek09>
rtohid: We build the wheels on ci or docker on local machine?
<Abhishek09>
ci- circleCI
<rtohid>
Abhishek09 shouldn't it work for all?
<Abhishek09>
What?
<rtohid>
doxker local machine, ..
<Abhishek09>
Then what we use?
<rtohid>
Let's start with docker.
<rtohid>
Abhishek09 btw, did all tests pass on mac?
<Abhishek09>
rtohid: `Abhishek09 shouldn't it work for all?` i didn't understand
<Abhishek09>
rtohid: i did not tested on mac
<rtohid>
have you been able to build phylanx at all?
<Abhishek09>
rtohid: Did u talking about docker desktop?
<Abhishek09>
rtohid: Yes builded but on mac
<rtohid>
first we need to make sure you can build phylanx
<rtohid>
so, did the tests pass?
<Abhishek09>
i didn't try on ubuntu
<rtohid>
did the tests pass on mac?
<Abhishek09>
but i face too much difficulty on mac , in building hpx
kale_ has joined #ste||ar
<Abhishek09>
i will share u screenshot soon
<rtohid>
so why don't you start with building hpx on a docker container?
<Abhishek09>
Ok i will try now with ubuntu container
<Abhishek09>
But i have to craft proposal also that's why i discussing with u
<rtohid>
Abhishek09 +1
<zao>
diehlpk_work: I was bored went to see if it would be possible to make an EasyBuild config for Phylanx... do you people not have any releases?
<diehlpk_work>
No, I do not think we have any release yet
<diehlpk_work>
hkaiser, Would know best
<Abhishek09>
rtohid we use docker desktop for this project ?
<rtohid>
Abhishek09 yes, you would need docker desktop to build and run containers.
<Abhishek09>
rtohid but how can this community see whats happening in this project as this is FOSS project , and docker desktop run on local machine
<rtohid>
Abhishek09 we use docker as reproducible development environment.
Abhishek09 has quit [Remote host closed the connection]
Abhishek09 has joined #ste||ar
nan1 has joined #ste||ar
<kale_>
rtohid, I'm getting failure while testing phylanx in "tests.unit.modules.algorithms.*" tests.
Abhishek0911 has joined #ste||ar
<rtohid>
kale_ on what platform?
<nikunj97>
kale_, could you provide complete report of ctest
<kale_>
rtohid, Ubuntu 19.10 x86
Abhishek09 has quit [Ping timeout: 240 seconds]
nikunj97 has quit [Read error: Connection reset by peer]
nikunj97 has joined #ste||ar
<rtohid>
kale_ is this master? it seems master is broken.
<kale_>
rtohid, yup I was working on master.
<nikunj97>
kale_, that's why
<kale_>
rtohid, Which branch/commit was working fine with all the tests ?
<nikunj97>
kale_, in any case seems like you're successfully able to build and test phylanx
<nikunj97>
so you've passed one milestone in the project
<nikunj97>
kale_, did you research on packaging binaries with pip?
<rtohid>
But I am not sure Steve runs tests before pushing.
<kale_>
nikunj97: Ya I had a look at it. But as diehlpk_work suggested me, I looked into another perspective of whether there can be a pip package or conda package for phylanx.
<rtohid>
kale_ let's focus on pip for now.
<nikunj97>
kale_, I concur with rtohid. Let's focus on pip as of now
<nikunj97>
rtohid, having a few tests failing is pretty normal I guess
<kale_>
rtohid, Yeah, I agree as porting a pip package to conda is easier.
<nikunj97>
kale_, I suggest you to work on a proposal now. Highlight your methods of implementation in the proposal
<nikunj97>
In case you have any doubts, we'll be more than glad to help you ;)
<rtohid>
+1
<kale_>
nikunj97, rtohid : Alright, I'll start with the proposal now. I'll keep you guys updated.
<nikunj97>
rtohid, see pm please
Abhishek09 has joined #ste||ar
Abhishek0911 has quit [Ping timeout: 240 seconds]
<hkaiser>
bita: so yah, I will be late. sorry for that
<bita>
hkaiser, no worries
kale_ has quit [Quit: Leaving]
<Abhishek09>
rtohid: hpx taking very long time for building
<nikunj97>
Abhishek09, that isn't unusual
<nikunj97>
did you pass -j$(nproc) parameter to make?
<Abhishek09>
No
<nikunj97>
then it'll take a really long time to build
<nikunj97>
try using: make -n$(nproc)
<nikunj97>
*make -j$(nproc)
<Abhishek09>
Did i cancel (it at 33%)
<Abhishek09>
?
<zao>
Build took around five minutes for me, but I built without tests or examples.
<nikunj97>
sure you can
<rtohid>
Abhishek09 yes, it would be much faster in parallel\
<zao>
(five minutes on 12 threads for me)
<nikunj97>
zao, try building it with examples one day ;)
<zao>
Note that parallel builds can be memory-intensive for compilation and linking, particularly of tests.
<zao>
nikunj97: I've done that to last a lifetime, thanks.
<nikunj97>
haha
<nikunj97>
it usually takes 12-15min on parallel build with examples on
<Abhishek09>
Without parallel , how much?
<nikunj97>
without parallel expect around 1h to 1h20min time for the build
<nikunj97>
zao, fun fact: I bought more ram during my gsoc period coz building hpx was painfully slow with 8gb ram and 8threads
<Abhishek09>
then i definetly cancel
<Abhishek09>
if i cancel , does it pollute something in building?
<zao>
It should be fine.
<zao>
Typically when you cancel a compiler or linker it doesn't leave partial artifacts behind, so a continued build will just build the things that are not yet done and carry on.
<Abhishek09>
nikunj97: how much `make -j$(nproc)` this take?
<nikunj97>
depends on the number of threads and ram
<nikunj97>
I suggest a good 2gb ram per thread
<Abhishek09>
docker
<Abhishek09>
n=12?
<nikunj97>
how many cores have you allocated to docker
<nikunj97>
?
<Abhishek09>
2 GB by default
<nikunj97>
good luck with those specifications ;)
<nikunj97>
I usually allocate 4 threads and 10-12gb to my docker setup
<Abhishek09>
i have 8gb ram on my mac
<Abhishek09>
only
<nikunj97>
use: make -j3
<zao>
Docker isn't like virtual machines tho, memory is shared with the host?
<nikunj97>
zao, yes
<nikunj97>
docker usually takes up all the ram allocated to it
<nikunj97>
eats it up good even if you're doing nothing
<zao>
nom nom
<Abhishek09>
same speed as before by using `make -j3`
<Abhishek09>
nikunj97
<nikunj97>
did you resize your docker allocations?
<nikunj97>
3 threads and 6gb ram?
<Abhishek09>
no
<nikunj97>
well then you can't expecting a boost in performance
<nikunj97>
don't get me wrong, it'll literally take a long while to compile hpx on your specifications
<nikunj97>
been there, done that
<Abhishek09>
2 threads and 2 gb ram
<nikunj97>
1 thread requires 2gb ram
<nikunj97>
compiling hpx is memory intensive
<nikunj97>
so if you're keeping 2 thread then keep at least 4gb ram
<zao>
bleh, blazetensor doesn't have releases either :(
<heller1>
Abhishek09: looks like you are running OOM
<Abhishek09>
OOM means?
<heller1>
delete the truncated file and retry
<heller1>
out of memory
<heller1>
ms[m]: hkaiser: it would be really cool to have a proper HPX logo for the site
<nikunj97>
zao, blaze tensor was still in works last time I was at LSU. bita will have more info on that
<hkaiser>
heller1: sure, do you have one?
<heller1>
no :(
<bita>
zao, is there anything specific you need from blaze_tensor?
<hkaiser>
heller1: right, that's why I used the ste||ar logo for now
<heller1>
figured as much
<zao>
bita: I'm looking at how hard it would be to package Phylanx in EasyBuild, which means I have to figure out what versions of dependencies to use. blaze_tensor (like phylanx) doesn't have any releases so I have to invent a version based on some commit that "looks good".
<nikunj97>
hkaiser, I know a few designers who can design HPX logo if you want ;)
<heller1>
the proportions look of
<zao>
So right now I've got messes like: BlazeTensor-20200216-g4ca90b8-foss-2019b.eb and Phylanx-20200319-ga14da8f-foss-2019b-Python-3.7.4.eb
<heller1>
with the relatively big logo and smaller font
<zao>
The downside of having to invent a version like that is that it's going to conflict with an actual versioning scheme once the software starts having version numbers.
<bita>
zao, I get it. I needed a few higher dimension array operations and we added them to the blaze_tensor. I don't plan to add anything in a foreseeable future. So, it may not change much, except when blaze moves headers and changes thing that we try to adapt to it. Does that help?
<hkaiser>
heller1: be my guest ;-) I'm incredibly busy with too many things right now - won't have too much time to tweak the website - sorry
<zao>
It's good to know it's stable. As a packager I just wish there were releases but I understand that it's not always possible.
<heller1>
hkaiser: sure, I am just sharing my observations ;)
<zao>
I'll just grab whatever is the tip of trunk and use that as dependency.
<bita>
Actually, I think that should be hkaiser's decision
<hkaiser>
zao, bita: I'm too lazy to create a real realease at this point
<hkaiser>
do we really need one?
<zao>
For any form of reproducability I need to pin the version or commit I use for dependencies in builds. Pointing out "master.tar.gz" wouldn't quite fly.
<zao>
It's way easier when the dependency has actual versioned releases.
Abhishek09 has quit [Remote host closed the connection]
<hkaiser>
rtohid: I'm ready whenever you are
<rtohid>
hkaiser I am ready too.
<zao>
Is Phylanx supposed to work against HPX 1.4.1 or is it targetting trunk?