<lsl88>
simbergm: sorry, saw the message late. Yeah! :)
nikunj has quit [Remote host closed the connection]
nikunj has joined #ste||ar
hkaiser has quit [Quit: bye]
K-ballo has quit [Quit: K-ballo]
nikunj has quit [Remote host closed the connection]
nikunj has joined #ste||ar
nikunj has quit [Remote host closed the connection]
<tarzeau>
heller, simbergm: i think i activated jemalloc now, added -D for mpi and fortran, and the same url for testing again (also the dep from -dev to lib pkg is there now)
<heller>
The missing -dev packages as well?
<tarzeau>
heller: yes debian has a meta mpi package, i think it's mpi-default-dev using the /etc/alternatives system
<heller>
Yes
<tarzeau>
heller: there's just one for now libhpx-dev and libhpx1
<tarzeau>
ahhh the missing headers boost ?
<heller>
Yeah
<heller>
And hwloc, IIRC
<tarzeau>
thanks for the reminder :)
<tarzeau>
but there's already libhwloc-dev and libblas-dev ?
<tarzeau>
why would hpx want to have a copy of them?
<tarzeau>
and there's also libopenblas-dev
<heller>
What'd be nice is to have a procedure we can follow for generating new packages for newer releases
<tarzeau>
take the old package of my packages debian/ directory put it in, have the right names (hpx_VERSION.orig.tar.gz) and run debuild
<heller>
We don't need a copy, we need libboost-dev and libhwloc-dev as a dependency of libhpx-dev
<tarzeau>
oh and i miss them, let me fix that
<Yorlik>
Anyone having experience with spdlog?
<tarzeau>
libhwloc-dev is already in
<tarzeau>
which one of blas-dev do i need libblas-dev or libopenblas-dev ?
<tarzeau>
how does it need a -D for cmake?
<zao>
heller: Yeah, Ubuntu LTS are doomed as it's a compile-time flag that might affect ABI and there's no maintainer, while debian has moved to newer packages where they don't pass in that flag anymore.
<zao>
So there's nothing to directly backport, as is common with Ubuntu.
<zao>
#$@#$ toy distro
<tarzeau>
18.04 you mean?
<heller>
tarzeau: no blas, boost
<tarzeau>
ah crap, rebuilding
<tarzeau>
boost b-d i already have, removing the blas again
<tarzeau>
although we do use ubuntu at work (100-200 machines), i only do the packaging for debian (and ubuntu just copies my stuff, fine for me as long as they don't break it)
<tarzeau>
i'm on 19.04 still, but should move to 19.10 for among debian sid for building/testing things
<heller>
[19:34:01] <9c27b0simbergm> and the second is that if I install libhpx-dev and libhpx1 I don't actually get any of the boost or hwloc headers installed
<heller>
tarzeau: ^^
<heller>
Are you planning to upstream the package?
simbergm has quit [Remote host closed the connection]
rori has joined #ste||ar
<tarzeau>
heller: what is upstream the package? i plan to officially maintain it in debian
<tarzeau>
heller: i'll add the depends for hwloc-dev and boost-dev then...
<zao>
tarzeau: Yeah, 18.04 LTS, the current one :)
<tarzeau>
18.04 is release, it'll not get it, but feel free to dget the source.dsc and rebuild it there for use
<zao>
I'm just talking about the openmpi messup, no idea what you and the folks here are up to :)
<zao>
Ran into it on our site when I accidentally used the OS openmpi.
<tarzeau>
heller: if you want to retry and confirm the dep on libhpx-dev is here for hwloc+boost now
<tarzeau>
(only the source package, the binary are still being built)
<tarzeau>
zao: how many machines with ubuntu 18.04 do you run?
<zao>
I don't think we consider btrfs production-ready :)
<tarzeau>
the inventory and sw scripts are also fun
<tarzeau>
we do since 1-2 years
<zao>
Most jobs don't use local I/O much at all, it's pretty much all against Lustre.
<tarzeau>
we also use eatmydata (for installing via ipxe/fully automatic workstation install) speed up is 100%
<tarzeau>
(eatmydata apt-get install ALLTHEPACKAGESWEWANT, about 5000)
<zao>
We use FAI for setting up enough node to run Puppet.
<tarzeau>
i was wondering since debian builds software in PORTABLE/COMPATIBLE way, the base is SSE2 only, and -march forbidden, it's highly not optimized for speed,
<tarzeau>
are you rebuilding part of the packages with optimizations?
<tarzeau>
we use debian-installer preseeding
<zao>
We use as little as possible from the distro installation.
<tarzeau>
and ansible (before it was dphys-config) and aptitude-robot for automatic updates (and xymon.phys.ethz.ch for monitoring)
<zao>
All software are built with our own toolchains, we have our own MPIs, BLAS impls, everything.
<tarzeau>
i see
<zao>
Scientific software, that is.
<tarzeau>
yep, i plan on improving this problem in debian
<zao>
Regular boring distro software comes from the distro, but we try to minimize the amount of deps it pulls in.
<zao>
Kind of need our own builds of SLURM/MPI/PMIX/etc. to work well on our interconnects.
<tarzeau>
we like debian packages. and use reprepro for own packages
<zao>
reprepro...
<tarzeau>
works like a champion :)
<zao>
Nice software, but has some ideas about having multiple versions of something in the repo at once.
<zao>
Makes it bothersome to selectively roll things out.
<zao>
We mirror all vendor-supplied packages into our own reprepro repos, to have some control over what madness Dell/HP/etc. do.
<tarzeau>
every node exports /scratch* (local disks without backup) and /export/data* (local disks that we do backup) to other machines, so that's very useful
<tarzeau>
we run the swiss offical debian mirror and ubuntu
<zao>
Not pointing at anyone in particular, but `find / -name 'blargh'` isn't a good way to find tools.
<zao>
:D
<tarzeau>
what is most cumbersome is ibm pc bios. are you playing with coreboot/flashrom?
<tarzeau>
(the time lost when booting/rebooting/installing/debugging/adding removing/hw)
<zao>
Stock firmware from whatever vendor there is.
<tarzeau>
i wouldn't mind getting rid of non-free vendor bios, and replacing it with something faster
<zao>
We'd value vendor support more, after all we pay for it :)
<tarzeau>
who is your vendor? intel doesn't give a shit about broken bioses
<tarzeau>
i had 14 ssd disks with broken firmware flashed yesterday (if not, they suddenly lose all data, haha)
<zao>
We've had IBM, Supermicro in the past, now Lenovo.
<heller>
Depends on how big a customer you are :p
<zao>
They're a bit slow at times, we're one of the first academic customers in this area after splitting off from IBM.
<tarzeau>
yeah supermicro we also have. but all kinds of vendors really. too small, eth zurich is not a commercial company but about 20k people, and about the same number of public ip addresses
<zao>
We have KNL nodes that are very vanilla Intel, and they're horrible.
<tarzeau>
you don't turn off hyperthreading for that intel-microcode stuff?
<zao>
For regular compute nodes, HT has been off since day one.
<tarzeau>
loosing so much resources/cpu power?
<zao>
For the KNLs, we're currently poking at the vendor for mitigations for the current MDS mess, but do not have high hopes.
<tarzeau>
KNL being?
<tarzeau>
ah found it on your web
<zao>
Knight's Landing, the 270-some core (SMT4) things.
<zao>
We bought those, and then Intel discontinued the whole product line and all future development :D
<tarzeau>
haha
<zao>
I'm not sure of the reasoning, but I would guess that we value predictability w.r.t hyperthreading and compressed memory.
<zao>
Allocation-wise, it tends to be memory that's constraining, but the variability in compression means that users can't really ask for less than expected anyway.
<zao>
Not sure how it would interact with cgroups either.
<zao>
Speaking of mirrors, our computer club runs the swedish mirror for ubuntu, debian, and everything else under the sun (ftp.acc.umu.se)
<tarzeau>
how many users do you have?
<tarzeau>
do you know/use nvtop?
<zao>
No idea, thousands?
<zao>
We're all remote, no local user workstations.
<zao>
(our users are all remote)
<tarzeau>
i see
<zao>
CS department runs a setup more similar to yours I'd reckon.
<zao>
End user access via SSH, graphical desktop over ThinLinc, and eventually some sort of JupyterHub maybe.
<zao>
We also have a compute cloud.
<tarzeau>
what will you do about python2?
<zao>
Regarding nvtop, don't really use that much, just the odd nvidia-smi.
<zao>
Any user-facing jobs run Python 2 and Python 3 from EasyBuild.
<tarzeau>
we're mainly departmen physics, but also support geomatic engineering team, they use 3d scanning/processing, packaged: colmap, cloudcompare, working on meshroom/alicevision
<zao>
Whatever the distro carries will not matter much, apart from powering EasyBuild and whatever things users run interactively if they don't load modules.
<zao>
There's probably Python in our services, but that's a bridge to burn when that day comes :)
<tarzeau>
i see. your machines work with submit/login nodes, and restricted to your users
<tarzeau>
our machines are on the internet, no firewall (restricted just by users, with ssh)
<jbjnr>
tarzeau: are you affiliated with the ETHZ people?
<tarzeau>
jbjnr: i'm employee of ETHZ yes
<tarzeau>
jbjnr: we met at cscs cmake course :) hi john, alex myczko
<tarzeau>
the a5 driver with private parking lot
<tarzeau>
department physics here, on hoenggerberg. anyone welcome to meet us, we have a nice view! and parking for free (just ask me how ;)
<tarzeau>
zao: having local users is so much helpful for all processes (solving all kinds of problems), as in, you can observe them, and give tips, and knowing your users generally is a big plus
<zao>
Indeed.
<tarzeau>
but we also have remote users, (travelers) and users from cern, psi.
<zao>
Some of our users are on campus, just in other departments.
<zao>
We also have the occasional courses where we interact in meatspace as well.
<jbjnr>
tarzeau: aha. great. Mikael is in the HIT building, you should track him down and chat.
<tarzeau>
simbergm: hahaha he's right next in the building where i support most people from J/K floor, just in G
<tarzeau>
he could've told so
<simbergm>
tarzeau: oops, I assumed you were eth but didn't realize you'd be that close
<simbergm>
come by for a coffee at some point :D (or the other way around)
<tarzeau>
simbergm: you know nicolas deutschmann? he was also at the course and is next to your office
<simbergm>
hmm, no, doesn't sound familiar
<tarzeau>
among one of the form developers, the one reimplementing form called reform in a strange programming language
<tarzeau>
we're in HPT H 7, but i can visit too, just not right now, maybe after lunch?
<simbergm>
tarzeau: yep, sounds good (most days are fine actually, you can come by next week as well)
<tarzeau>
btw, model zoo, for gpu interested people, i just met schawinski (ex prof) who went on developping autolearning, so you can get rid of training cnn and right go ahead using it, (met two of the developers because i was in contact with them for software support)
<tarzeau>
i was wondering if anyone is using distcc, or oclint ?
<heller>
simbergm: ping
<heller>
simbergm: does the login work for you know?
<simbergm>
heller: nope, still not
<simbergm>
for you?
<heller>
simbergm: works for me (tm). I can give you an account on my jumphost
<heller>
looks like a Debug vs Release build mismatch
<simbergm>
jbjnr: nope
<jbjnr>
release/debug mismatch. bad me
<jbjnr>
must add a warning to the cmake if there isn't one already
<jbjnr>
are we using the same tutorial examples as before, or should we use simbergm new stuff?
<heller>
i thought we were using simbergm's on day 1 and the stneicl on day 2
<jbjnr>
should we add simbergm 's stuff to the tuorial repo to kleep everything in one place and make build of everything in one go easier?
<heller>
sounds good to me
<jbjnr>
at one point we discussed a 'distributed hpx' section, but then forgot about it. Do we cover this in any significant way? Does the stencil stuff run distributed?
<heller>
yes
<jbjnr>
ah yes. I remember now
<jbjnr>
just saxpy_parallel_cuda that fails now. Are we going to cover the hpx::compute stuff - or do we ditch it and instead throw in the cuda futures and cublas_ examples
<jbjnr>
I'll ditch it or now and add the cuda/cublas examples
<jbjnr>
^for now
K-ballo has joined #ste||ar
<simbergm>
heller: thanks, login works
<simbergm>
jbjnr: I'll add my exercises to the tutorials repo
<heller>
jbjnr: ditch it indeed in favor of the cuda future example
hkaiser has joined #ste||ar
<jbjnr>
heller: just a thought. You probably do more debugging with gdb command line that I do, so please be ready to step in with help during the debuging session if I'm doing it.
<jbjnr>
(I'm using QtCreator these days for mt desktop development)
<heller>
Noted
K-ballo has quit [Quit: K-ballo]
K-ballo has joined #ste||ar
<simbergm>
hkaiser: yt?
<hkaiser>
here
<simbergm>
thanks for fixing my mess on your branch...
<hkaiser>
np
<simbergm>
but don't merge it yet, please
<hkaiser>
ok
<simbergm>
turns out the header tests are actually called tests.headers.headers.blah
<hkaiser>
ahh, I messed that up in the yml file
<hkaiser>
thought this was a typo
<simbergm>
auriane's PR actually changes the names to be tests.headers.blah but doesn't change the circleci config
<simbergm>
so I've asked her to cherry pick that change to her branch
<simbergm>
and then we can remove almost all of that commit from your PR
<simbergm>
it was kind of unrelated anyway
<hkaiser>
if its separable, sure
<hkaiser>
yes
<simbergm>
yeah, np I though it was a typo as well, hence the approval
<hkaiser>
just adds a new target to circle
<simbergm>
yeah, that one commit comes out pretty cleanly
<hkaiser>
ok, let me know once its fine to merge the all2all, I really need it for Phylanx
<simbergm>
ok, I'll clean it up asap
<simbergm>
ok to force push?
<hkaiser>
thanks a lot!
<hkaiser>
if you don't overwrite my latest changes, sure
<hkaiser>
;-)
<simbergm>
I'll try not to
<simbergm>
hkaiser: I ended up just changing the headers.headers part back, you had a bunch of other useful changes there that I left in
<hkaiser>
ok
<hkaiser>
thanks
<simbergm>
do you mind waiting for circleci before merging?
<hkaiser>
sure, let's wait
<K-ballo>
heh
<hkaiser>
K-ballo: btw, Windows install should be fixed now
<K-ballo>
the location of the libs and the dlls? I'll give it a try
<simbergm>
thanks, let's hope that's the last problem for this round
<hkaiser>
yah
<diehlpk_work>
simbergm, I just got the e-mail from Google. They will review the applications first and will send me the links of the accepted proposals from their side
<diehlpk_work>
We need to decide which one we accepted by Jul 23
<diehlpk_work>
hkaiser, Parsa and I implemented the major compiler version check when building hpx as binary packages
<hkaiser>
diehlpk_work: cool!
<hkaiser>
thanks
<hkaiser>
thanks parsa as well!
<heller>
simbergm: jbjnr: vampir is available
<heller>
simbergm: jbjnr: we did provide a preinstalled HPX last time
<heller>
simbergm: jbjnr: we can test without a reservation as well *phew*
nikunj97 has quit [Remote host closed the connection]
diehlpk_mobile has joined #ste||ar
diehlpk_mobile has quit [Read error: Connection reset by peer]
<Yorlik>
Do you guys have siggestions for instrumentation? I know HPX offers performance counters, just wondering what to use to collect and present data - not only HPX specifics. I'm currently looking at Looking at https://prometheus.io/ and https://github.com/jupp0r/prometheus-cpp as possible instrumentation solution: Ideas? Suggestions?