hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
jaafar has joined #ste||ar
hkaiser has quit [Quit: bye]
quaz0r has quit [Ping timeout: 245 seconds]
quaz0r has joined #ste||ar
K-ballo has quit [Quit: K-ballo]
jaafar has quit [Ping timeout: 250 seconds]
jbjnr_ has quit [Ping timeout: 240 seconds]
nikunj has joined #ste||ar
Yorlik has joined #ste||ar
nikunj has quit [Ping timeout: 246 seconds]
nikunj has joined #ste||ar
jaafar has joined #ste||ar
nikunj has quit [Remote host closed the connection]
nikunj has joined #ste||ar
jaafar has quit [Ping timeout: 250 seconds]
nikunj has quit [Ping timeout: 255 seconds]
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 245 seconds]
nikunj has joined #ste||ar
K-ballo has joined #ste||ar
hkaiser has joined #ste||ar
hkaiser has quit [Quit: bye]
aserio has joined #ste||ar
hkaiser has joined #ste||ar
aserio has quit [Ping timeout: 250 seconds]
aserio has joined #ste||ar
jaafar has joined #ste||ar
aserio has quit [Ping timeout: 250 seconds]
<diehlpk_work>
simbergm, let me know when you have the first rc for 1.3 and I will run it on Fedora
aserio has joined #ste||ar
aserio has quit [Quit: aserio]
aserio has joined #ste||ar
nikunj has quit [Remote host closed the connection]
khuck has joined #ste||ar
<khuck>
heller: you there?
<khuck>
are there any libfabric experts out there? or something similar?
<aserio>
khuck: What are you looking for?
<khuck>
help building HPX on cori knl
<khuck>
either it's all broken, or I don't know how to do it. the latter is more likely
<aserio>
I believe daissgr has got it built
<khuck>
I am emailing with him
<parsa>
diehlpk_work: see ^
<khuck>
I don't think he built HPX with the MPI parcel port.
<khuck>
I'd like to take kitware out behind the woodshed
<parsa>
i heard he built a two year old octotiger with a custom version of Vc without MPI
<aserio>
parsa: can you find that email
<khuck>
parsa: I heard the same, but that doesn't exactly help us for the SC paper
<parsa>
aserio: which one?
<aserio>
I know he just sent it, but I can't find it
<khuck>
I have instructions to do that, but that only uses an old version of Vc that requires a branch of octotiger
<khuck>
I haven't gotten that far
<khuck>
I am still fighting cmake/hpx
<aserio>
parsa: the one where daissgr was describing what he was doing on Cori
<khuck>
aserio: he built with the TCP parcel port, not the MPI one. That's the main difference.
<parsa>
aserio: i haven't received anything about Cori from daissgr
<aserio>
:/
<aserio>
khuck: give me a sec
<parsa>
Vc has dropped support for Xeon Phi, and its API has changed. octotiger as we have it now uses the new API
<parsa>
diehlpk_work said daissgr tried using the current Octotiger and Vc (no MPI) but the performance was terrible
<diehlpk_work>
khuck, @Kevin: I managed to get two versions of Octo-Tiger to run on Cori. Since I ran into quite a lot of problems with Cray compiler, I used gcc instead for these builds. The first version I built simply uses the TCP (?) backend without mpi, the other one uses the parcelport mpi backend with openmpi.
<parsa>
aserio: i found the email. forwarded.
<diehlpk_work>
khuck, I think he built it with MPI, see the forwarded e-mail from parsa
<aserio>
parsa: thanks!
<khuck>
diehlpk_work: that shouldn't be necessary - the Cray MPICH should work fine
<diehlpk_work>
khuck, daissgr will be here soon
<diehlpk_work>
he has more information about the build process on Cori
<daissgr>
alright I am here now! I actually had IRC still open on a completly different machine here - give me a second
<daissgr>
yeah - I build it with the MPI parcelport once. I loaded the opempi package and compiled everything with the mpicxx. It built just fine but the scaling was rather bad for that one even with just two nodes
<daissgr>
So something went wrong there
<daissgr>
parsa: I did not build a two year Octo-Tiger. I used a two-year old Vc version that actually supports AVX512
<daissgr>
and adapted the current Octo-Tiger to work with it - which was a pain
<daissgr>
without it we lose most of the performance though
<daissgr>
parsa: The tests I ran when without your counters still - sorry about that. I just tested the TCP variant anyway
<daissgr>
khuck: It would be great if you get it to work with the Cray MPICH. Originally I wanted to build everything with the cray compiler to do just that - which proofed to be rather troublesome hence me testing the TCP parcel port
<khuck>
daissgr: that's the goal
<parsa>
daissgr: the TCP variant is not a factor for those performance counters. those counters measure what AGAS does on each locality.
<parsa>
:s/TCP variant/parcelport
<daissgr>
parsa: I see. I meant to add them for any actual runs, but I was only quickly testing the variant with the TCP and moved on with trying to build a MPI variant so I never got around doing it
<daissgr>
btw the current Octo-Tiger that works with the xeon phis is in the branch knl-build
<daissgr>
Though, it's not just the Xeon Phis - it's also a massive performance boost on other AVX512 machines
<khuck>
daissgr: ok, I fixed the CrayKNL.cmake file - it wasn't getting ignored, it was just out of date
<khuck>
and/or broken
aserio has quit [Ping timeout: 250 seconds]
<khuck>
ok, I've got octotiger built with the Cray MPI wrapper compilers. Is there a good test script?
<diehlpk_work>
khuck, Run the ctes
<diehlpk_work>
t
<khuck>
?
<diehlpk_work>
make tests
<diehlpk_work>
parsa, added some ctests
<diehlpk_work>
The same ones as we use for circle-ci
<parsa>
what ctests do you want me to add
<parsa>
?
<diehlpk_work>
parsa, None, I just told Kevin you added them to the latest version of octotiger
<khuck>
diehlpk_work: I ran : srun -N 1 -n 1 -c 68 ./build/octotiger/build/octotiger --config_file=`pwd`/src/octotiger/test_problems/sod/sod.ini --hpx:threads=68
<khuck>
and while I got some silo errors, it appeared to run
<khuck>
I get a ton of these errors:
<khuck>
DBWrite: File was closed or never opened/created.: X.0.silo
<khuck>
DBPutQuadvar1: File was closed or never opened/created.: X.0.silo
<khuck>
but the file is there - just empty.
<khuck>
however, disabling output seems to take care of that issue
<khuck>
I am rebuilding now with APEX, just for giggles
<diehlpk_work>
khuck, Can you run the rotating star?
<diehlpk_work>
Do you have the files to run the same as I did on daint?
<khuck>
I do not
<khuck>
i am currently rebuilding with the "old" version of Vc
<diehlpk_work>
I will provided them soon to you
maxwellr96 has joined #ste||ar
<maxwellr96>
Does anyone have suggestions about why I would be getting this error? assertion 'detail::power2(log2credits) == credits' failed: HPX(assertion_failure)
<diehlpk_work>
maxwellr96, Can you provide more context?
<maxwellr96>
Well, the issue is that it's not happening super consistently. Only about 50% of the time
<maxwellr96>
Sometimes the program finishes executing just fine
<maxwellr96>
I just thought maybe someone would have a general idea of why a gid splitting its credits would break