hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
<heller_> i thought you mentioned openmpi version 4 the other day (which seems to work fine for me)
mdiers_1 has joined #ste||ar
mdiers_ has quit [Remote host closed the connection]
mdiers_1 is now known as mdiers_
K-ballo has quit [Quit: K-ballo]
hkaiser has quit [Quit: bye]
nikunj has joined #ste||ar
david_pfander has joined #ste||ar
<Amy1> which benchmark do you use to test memory peak bandwidth?
<jbjnr__> there should be a tests/performance/local/stream benchmark, but it might not be maintained and need some tweaks
<jbjnr__> we made changes o the way threads are assigned etc and the test might not have been updated
<Amy1> someelse???
<jbjnr__> not sure I understand the question
nikunj has quit [Quit: Leaving]
<Amy1> which benchmark do you use to test memory peak bandwidth?
<Amy1> I want to know a common used?
<jbjnr__> tests/performance/local/stream.cpp
<Amy1> could you give me a link?
<Amy1> github link
<jbjnr__> just click on the subdirs and you'll find it
<heller_> jbjnr__: it works pretty well
<Amy1> heller_: could you give me a github link?
<Amy1> about the memory test tool.
<heller_> it's not testing memory
<Amy1> ...
<Amy1> I want a memory test benchmark.
<heller_> this is the original reference: https://www.cs.virginia.edu/stream/
<Amy1> thank
<Amy1> But I am confusing with the result of stream benchmark.
<heller_> well, this benchmark is trying to determine the achievable bandwidth you can get on a given machine
<Amy1> My machine 8 channel, 1600Hz, should 100GB/s, but stream result is 300GB/s.
<Amy1> I think stream result is wrong.
<Amy1> and MLC is also 100GB/s
nikunj has joined #ste||ar
<heller_> good for you then ;)
<heller_> what machine are we looking at, how *exactly* did you run the benchmark, what's the exact output?
<Amy1> 2683 v4
<Amy1> gcc -O3 -fopenmp stream.c
<Amy1> ./a.out output is
<Amy1> Copy: 305040.2909 0.0001 0.0001 0.0001
<Amy1> Function Rate (MB/s) Avg time Min time Max time
<heller_> well, the default is 10 megabyte
<Amy1> ?
<heller_> you have quite some cache in your machine
<heller_> NB: peak memory bandwidth for your processor is 76.8 GB/s
<Amy1> ??? how do you calculate 76.8GB/s???
<heller_> it says so in the specs
<Amy1> ...
<Amy1> I want to know why stream shows 300GB/s
<heller_> I don't know. I can't reproduce your result
<heller_> s
<jbjnr__> amy try increasing the problem sizeand rerunning just to make sure
<heller_> perf stat -e cache-misses,cache-references ./a.out
<heller_> gives you a good idea
<heller_> for my machine here, for example: https://gist.github.com/sithhell/fc304c6e9e8e84bd63a4d54b4f61e0a7
<heller_> you see that the number of cache misses decreases and the bandwidth increases?
<heller_> your CPU has twice the amount of cache available
<heller_> Amy1: clear now?
K-ballo has joined #ste||ar
daissgr has quit [Ping timeout: 264 seconds]
<Amy1> thank
daissgr has joined #ste||ar
david_pfander1 has joined #ste||ar
quaz0r has quit [Ping timeout: 255 seconds]
david_pfander has quit [Ping timeout: 255 seconds]
Yorlik has quit [Ping timeout: 255 seconds]
david_pfander1 is now known as david_pfander
quaz0r has joined #ste||ar
daissgr has quit [Ping timeout: 250 seconds]
daissgr has joined #ste||ar
<zao> heller_: Tested on one of our login nodes with OpenMPI 2.1.1, which should be a similar version as the one installed in the image. Test passes.
<zao> Using the OpenMPI and compiler from EasyBuild, so GCC 6.4.0 and OpenMPI 2.1.1.
<zao> So there might be variations both in compilers and transports there.
<zao> (also passes with GCC 7.3.0 and OpenMPI 3.1.1)
<heller_> interesting
<heller_> I tested with gcc 7 and clang 8
<heller_> and both fail
<heller_> let me retry with gcc 6
<zao> This is completely without the image, mind you.
<zao> And I don't get any mentions about downgrading to a worse transport.
rohit64 has joined #ste||ar
<zao> We do have some differences in how they're built with UCX and PMI and SLURM and heaven knows what else.
<heller_> I get the feeling that you just shouldn't use the MPI package from your distribution...
<zao> Should try at home I guess, where the env is less cluster-specific.
<heller_> the image is a standard ubuntu 18.04
<heller_> same with gcc6
<zao> Building the same EasyBuild toolchain at home now, gonna take a while as it's building a bajillion variations of FFTW :)
<zao> hwloc/1.11.7 and hwloc/1.11.10 btw in the successful cases
hkaiser has joined #ste||ar
daissgr has quit [Ping timeout: 268 seconds]
rohit64 has quit [Quit: http://www.kiwiirc.com/ - A hand crafted IRC client]
daissgr has joined #ste||ar
<heller_> zao: just building openmpi 2.1.1 myself
<heller_> not using easybuild or somesuch
<heller_> self built works as well :/
hkaiser has quit [Ping timeout: 264 seconds]
hkaiser has joined #ste||ar
<hkaiser> simbergm: I think I caught the set_thread_state error red-handed: https://circleci.com/gh/STEllAR-GROUP/hpx/70587
<simbergm> hkaiser: bastard
<simbergm> (not you, the test...)
<simbergm> why would it do that
<simbergm> thanks, that helps a bit
<hkaiser> lol
<hkaiser> simbergm: an assert before setting the state in thread_data may help
aserio has joined #ste||ar
<simbergm> yeah, sounds good
eschnett_ has joined #ste||ar
<zao> I took a look at the debian patches for OpenMPI, nothing that's intended to affect x86_64.
<zao> Just arm64, HPPA, Hurd, and some build infra.
<zao> debian/rules has a fair bit of build flag overriding, tho.
bita has quit [Read error: Connection reset by peer]
<zao> I found this which seemed very close in description, but it doesn't change anything for me if I run with other transports (self,tcp or self,sm) or the suggested disable of CMA for the vader btl. - https://github.com/open-mpi/ompi/issues/4948
<zao> I guess that it _could_ be container related still.
<zao> heller_: Did you try your own OpenMPI inside or outside a container?
<zao> Bah, that theory smashed, my EasyBuild-sourced OpenMPI works in the container.
<zao> heller_: Even more interesting, if I just add that self-built OpenMPI to LD_LIBRARY_PATH for the container-built test, it passes.
<zao> So there's something inherently hecked up with the distro's libopenmpi
<zao> Time to get some actual work work done before the end of the day, this has been a fun rabbit hole :D
eschnett_ has quit [Quit: eschnett_]
eschnett_ has joined #ste||ar
akheir has quit [Quit: Konversation terminated!]
akheir has joined #ste||ar
Yorlik has joined #ste||ar
jaafar has quit [Ping timeout: 264 seconds]
eschnett_ has quit [Quit: eschnett_]
eschnett_ has joined #ste||ar
<Yorlik> hkaiser: I shared a google doc folder with you for the persistent id_type requirements. You should have mail.
<hkaiser> Yorlik: gotcha
<Yorlik> :)
<Yorlik> Can you see the google chat ?
<Yorlik> Saw you typing already.
david_pfander has quit [Ping timeout: 268 seconds]
nikunj has quit [Quit: Leaving]
<zao> I’m going to try OpenMPI on an actual machine tomorrow if I get time.
<zao> Via distort
<zao> *distro
jaafar has joined #ste||ar
eschnett_ has quit [Quit: eschnett_]
eschnett_ has joined #ste||ar
aserio has quit [Ping timeout: 264 seconds]
<heller_> zao: outside a container
<heller_> hkaiser: I am available now
<hkaiser> heller_: sec
aserio has joined #ste||ar
Abhishek09 has joined #ste||ar
<Abhishek09> hello guys
Abhishek09 has quit [Quit: Page closed]
<Yorlik> hkaiser ?
<Yorlik> Relocating an object in a store/load cycle as we thought of could actually be done while the system is running if - and only if - the application/developer knows 100% sure there are no remaining references that would need updating, or if there were some sort of reference bookkeeping in the application.
<Yorlik> Just fantasizing about restructuring a long running application while it's running.
<zao> Yorlik: I'm patiently waiting for you to find out that saving/loading Lua state is a royal pain in the back :D
<Yorlik> We are not planing to save lua states
<Yorlik> Our plans for Lua are much simpler.
<zao> Ooh, boring :D
<Yorlik> Actually we will not have any mutrable state in the lua states
<Yorlik> We will make extensive use of lua working on exposed userdata objects
aserio has quit [Ping timeout: 264 seconds]
bibek has quit [Quit: Konversation terminated!]
eschnett_ has quit [Quit: eschnett_]
aserio has joined #ste||ar
aserio has quit [Quit: aserio]
<zao> Good news! Boost seems to have gotten GSoC this year.
<K-ballo> "seems to" indeed
<heller_> yay
<K-ballo> how many years have we participated in GSoC? 5? 7?
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
<hkaiser> Yorlik: I think this is limitation too strong
<hkaiser> it is sufficient for an object to be 'migratable to storage' as long as there is no active thread scheduled or running
<hkaiser> if other objects decide to schedule work on an object that is currently in storage this can be 'brought back to life' transparently as long as the application is running
<Yorlik> hkaiser: My line of thoughts was just around optimizing a running simulation with lots of object migrations. It's a special case ofc.
<hkaiser> Yorlik: I see 'migration to storage' (checkpointing) as a special case of object migration
<Yorlik> How do you think about the remapping then which places AGAS responsibility and Object locality together again?
<Yorlik> Do you think it could be done live efficiently?
<hkaiser> yah, I think so
<Yorlik> That would be super nice I think.