hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
jaafar has joined #ste||ar
hkaiser has quit [Quit: bye]
quaz0r has quit [Ping timeout: 245 seconds]
quaz0r has joined #ste||ar
K-ballo has quit [Quit: K-ballo]
jaafar has quit [Ping timeout: 250 seconds]
jbjnr_ has quit [Ping timeout: 240 seconds]
nikunj has joined #ste||ar
Yorlik has joined #ste||ar
nikunj has quit [Ping timeout: 246 seconds]
nikunj has joined #ste||ar
jaafar has joined #ste||ar
nikunj has quit [Remote host closed the connection]
nikunj has joined #ste||ar
jaafar has quit [Ping timeout: 250 seconds]
nikunj has quit [Ping timeout: 255 seconds]
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 245 seconds]
nikunj has joined #ste||ar
K-ballo has joined #ste||ar
hkaiser has joined #ste||ar
hkaiser has quit [Quit: bye]
aserio has joined #ste||ar
hkaiser has joined #ste||ar
aserio has quit [Ping timeout: 250 seconds]
aserio has joined #ste||ar
jaafar has joined #ste||ar
aserio has quit [Ping timeout: 250 seconds]
<diehlpk_work> simbergm, let me know when you have the first rc for 1.3 and I will run it on Fedora
aserio has joined #ste||ar
aserio has quit [Quit: aserio]
aserio has joined #ste||ar
nikunj has quit [Remote host closed the connection]
khuck has joined #ste||ar
<khuck> heller: you there?
<khuck> are there any libfabric experts out there? or something similar?
<aserio> khuck: What are you looking for?
<khuck> help building HPX on cori knl
<khuck> either it's all broken, or I don't know how to do it. the latter is more likely
<aserio> I believe daissgr has got it built
<khuck> I am emailing with him
<parsa> diehlpk_work: see ^
<khuck> I don't think he built HPX with the MPI parcel port.
<khuck> that's my current problem. HPX / CMake is ignoring the toolchain file (see https://github.com/STEllAR-GROUP/hpx/issues/3772)
<khuck> so I have to everything by hand instead
<khuck> it's not pleasant
<khuck> I'd like to take kitware out behind the woodshed
<parsa> i heard he built a two year old octotiger with a custom version of Vc without MPI
<aserio> parsa: can you find that email
<khuck> parsa: I heard the same, but that doesn't exactly help us for the SC paper
<parsa> aserio: which one?
<aserio> I know he just sent it, but I can't find it
<khuck> I have instructions to do that, but that only uses an old version of Vc that requires a branch of octotiger
<khuck> I haven't gotten that far
<khuck> I am still fighting cmake/hpx
<aserio> parsa: the one where daissgr was describing what he was doing on Cori
<khuck> aserio: he built with the TCP parcel port, not the MPI one. That's the main difference.
<parsa> aserio: i haven't received anything about Cori from daissgr
<aserio> :/
<aserio> khuck: give me a sec
<parsa> Vc has dropped support for Xeon Phi, and its API has changed. octotiger as we have it now uses the new API
<parsa> diehlpk_work said daissgr tried using the current Octotiger and Vc (no MPI) but the performance was terrible
<diehlpk_work> khuck, @Kevin: I managed to get two versions of Octo-Tiger to run on Cori. Since I ran into quite a lot of problems with Cray compiler, I used gcc instead for these builds. The first version I built simply uses the TCP (?) backend without mpi, the other one uses the parcelport mpi backend with openmpi.
<parsa> aserio: i found the email. forwarded.
<diehlpk_work> khuck, I think he built it with MPI, see the forwarded e-mail from parsa
<aserio> parsa: thanks!
<khuck> diehlpk_work: that shouldn't be necessary - the Cray MPICH should work fine
<diehlpk_work> khuck, daissgr will be here soon
<diehlpk_work> he has more information about the build process on Cori
<daissgr> alright I am here now! I actually had IRC still open on a completly different machine here - give me a second
<daissgr> yeah - I build it with the MPI parcelport once. I loaded the opempi package and compiled everything with the mpicxx. It built just fine but the scaling was rather bad for that one even with just two nodes
<daissgr> So something went wrong there
<daissgr> parsa: I did not build a two year Octo-Tiger. I used a two-year old Vc version that actually supports AVX512
<daissgr> and adapted the current Octo-Tiger to work with it - which was a pain
<daissgr> without it we lose most of the performance though
<daissgr> parsa: The tests I ran when without your counters still - sorry about that. I just tested the TCP variant anyway
<daissgr> khuck: It would be great if you get it to work with the Cray MPICH. Originally I wanted to build everything with the cray compiler to do just that - which proofed to be rather troublesome hence me testing the TCP parcel port
<khuck> daissgr: that's the goal
<parsa> daissgr: the TCP variant is not a factor for those performance counters. those counters measure what AGAS does on each locality.
<parsa> :s/TCP variant/parcelport
<daissgr> parsa: I see. I meant to add them for any actual runs, but I was only quickly testing the variant with the TCP and moved on with trying to build a MPI variant so I never got around doing it
<daissgr> btw the current Octo-Tiger that works with the xeon phis is in the branch knl-build
<daissgr> Though, it's not just the Xeon Phis - it's also a massive performance boost on other AVX512 machines
<khuck> daissgr: ok, I fixed the CrayKNL.cmake file - it wasn't getting ignored, it was just out of date
<khuck> and/or broken
aserio has quit [Ping timeout: 250 seconds]
<khuck> ok, I've got octotiger built with the Cray MPI wrapper compilers. Is there a good test script?
<diehlpk_work> khuck, Run the ctes
<diehlpk_work> t
<khuck> ?
<diehlpk_work> make tests
<diehlpk_work> parsa, added some ctests
<diehlpk_work> The same ones as we use for circle-ci
<parsa> what ctests do you want me to add
<parsa> ?
<diehlpk_work> parsa, None, I just told Kevin you added them to the latest version of octotiger
<khuck> diehlpk_work: I ran : srun -N 1 -n 1 -c 68 ./build/octotiger/build/octotiger --config_file=`pwd`/src/octotiger/test_problems/sod/sod.ini --hpx:threads=68
<khuck> and while I got some silo errors, it appeared to run
<khuck> I get a ton of these errors:
<khuck> DBWrite: File was closed or never opened/created.: X.0.silo
<khuck> DBPutQuadvar1: File was closed or never opened/created.: X.0.silo
<khuck> but the file is there - just empty.
<khuck> however, disabling output seems to take care of that issue
<khuck> I am rebuilding now with APEX, just for giggles
<diehlpk_work> khuck, Can you run the rotating star?
<diehlpk_work> Do you have the files to run the same as I did on daint?
<khuck> I do not
<khuck> i am currently rebuilding with the "old" version of Vc
<diehlpk_work> I will provided them soon to you
maxwellr96 has joined #ste||ar
<maxwellr96> Does anyone have suggestions about why I would be getting this error? assertion 'detail::power2(log2credits) == credits' failed: HPX(assertion_failure)
<diehlpk_work> maxwellr96, Can you provide more context?
<maxwellr96> Well, the issue is that it's not happening super consistently. Only about 50% of the time
<maxwellr96> Sometimes the program finishes executing just fine
<maxwellr96> I just thought maybe someone would have a general idea of why a gid splitting its credits would break
<maxwellr96> But also only do so inconsistently
<diehlpk_work> Are you using futures or dataflow?
<maxwellr96> This is the code, I'm working on the locality_lists branch
aserio has joined #ste||ar
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 268 seconds]
aserio1 is now known as aserio
daissgr1 has joined #ste||ar
hkaiser has quit [Quit: bye]
<aserio> khuck: still struggling with your favorite software?
<aserio> daissgr daissgr1: will you be joining us now?
<daissgr1> on our way
<daissgr1> webex is loading forever
Vir has quit [Ping timeout: 246 seconds]
hkaiser has joined #ste||ar
hkaiser has quit [Read error: Network is unreachable]
<khuck> aserio: webex is gar-bage
<aserio> khuck: :( Is it a Mac problem?
<khuck> aserio: sure, whatever
<aserio> lol
Vir has joined #ste||ar
<khuck> aserio: bluejeans has no problems
<aserio> What I mean is, it the machine which is having compatibility issues or is it a connectivity one
<khuck> mac pro
<aserio> what is bluejeans?
<khuck> desktop
<khuck> bluejeans is real videoconference software that DOE uses
<khuck> silly name, but it works
<aserio> I will steal Hartmut's saying: Videoconferencing software is like a toothbrush
<K-ballo> change every three months?
hkaiser has joined #ste||ar
<aserio> Everyone has one and no one wants to share
Vir has quit [Ping timeout: 264 seconds]
hkaiser has quit [Read error: Connection reset by peer]
Vir has joined #ste||ar
hkaiser has joined #ste||ar
maxwellr96 has quit [Read error: Connection reset by peer]
hkaiser has quit [Quit: bye]
aserio has quit [Ping timeout: 245 seconds]
aserio has joined #ste||ar
Vir has quit [Ping timeout: 246 seconds]
Vir has joined #ste||ar
<diehlpk_work> simbergm, parsa https://github.com/STEllAR-GROUP/hpx/wiki/GSoD-2019-Project-Ideas Please add your project ideas for Google Season of Documentation here
Vir has quit [Ping timeout: 246 seconds]
<diehlpk_work> simbergm, parsa https://github.com/STEllAR-GROUP/hpx/wiki/GSoD-2019-Organization-Application I added the questions for the organization application here and answered the easy ones already.
<diehlpk_work> Only two are missing
<parsa> diehlpk_work: will do. thanks a lot
daissgr1 has quit [Ping timeout: 240 seconds]
aserio has quit [Quit: aserio]
hkaiser has joined #ste||ar
<parsa> hkaiser: is there anything like std::transform that we can use to iterate over three std::vectors with?
<hkaiser> zipiterator
<parsa> \boost?
<hkaiser> parsa: hpx::util
<parsa> ah, perfect! thanks!
K-ballo has quit [Quit: K-ballo]
K-ballo has joined #ste||ar