hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020
nikunj has quit [Ping timeout: 252 seconds]
nikunj has joined #ste||ar
hkaiser_ has quit [Quit: bye]
nan11 has joined #ste||ar
weilewei has quit [Remote host closed the connection]
akheir1 has quit [Remote host closed the connection]
akheir1 has joined #ste||ar
nan11 has quit [Remote host closed the connection]
akheir1 has quit [Quit: Leaving]
<simbergm> zao: nice job on slack :D
<zao> :D
nikunj has quit [Read error: Connection reset by peer]
nikunj has joined #ste||ar
hkaiser has joined #ste||ar
<K-ballo> which slack is that..?
<zao> I mentioned HPX in #hpc on the C++ one, seems to have hooked at leat one person on it.
<tiagofg[m]> hkaiser_: Do you have some news about the channel problem
<tiagofg[m]> ?
<tiagofg[m]> btw, my master thesis superviser teacher is also working with hpx and can he cam to this riot chat?
<heller1> tiago.fg: absolutely!
<tiagofg[m]> thanks! how do I do that?
<tiagofg[m]> I will invite
<tiagofg[m]> thanks
<heller1> This is a public channel after all
<hkaiser> tiagofg[m]: I'm not in control of this channel, heller1 and simbergm will have to figure that out
<tiagofg[m]> regarding the hpx::lcos::channel problem?
<hkaiser> tiagofg[m]: ahh, I thought you were referring to the matrix channel
<hkaiser> tiagofg[m]: wrt the hpx::lcos::channel: I have not looked into things yet, sorry
bita has joined #ste||ar
<tiagofg[m]> ok, is just because causes me problems measuring performance.
<hkaiser> does it now?
<hkaiser> if you really run into issues with this, doesn't that mean that you have not sufficient parallelism in the first place?
<tiagofg[m]> no, I think not, but it will be a problem in the future. In this moment I am only developing, but I won't be able to measure perfermance as I'd like. At least with a clear conscience
<tiagofg[m]> in terms of parallelism efficiency I have not yet tested
<heller1> tiago.fg: why do you think this is affecting performance?
nan11 has joined #ste||ar
<heller1> tiago.fg: i would not recommended to measure performance in terms of CPU load
<tiagofg[m]> yeah you are right, but is a good indicator to see what's going on
<hkaiser> tiagofg[m]: that's a red herring
<heller1> indeed
<tiagofg[m]> ok understend
<heller1> it might be critical if you measure energy, and your application is idle most of the time, but other than that, it should not affect anything
<tiagofg[m]> I see
<heller1> if you need a indicator about how busy your system is, you can inspect the idle rate performance counters, (run cmake with -DHPX_WITH_THREAD_IDLE_RATES=On), with that, you see the idle time of the HPX schedulers, which will tell you exactly what you want
<heller1> this can be printed periodically, when the program exits, or at user defined (via API calls) points in your program
<tiagofg[m]> ok thank you for the hints!
K-ballo has quit [Remote host closed the connection]
K-ballo has joined #ste||ar
karame_ has joined #ste||ar
weilewei has joined #ste||ar
nikunj97 has joined #ste||ar
Nikunj__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 256 seconds]
gonidelis has joined #ste||ar
gonidelis has quit [Remote host closed the connection]
nikunj97 has joined #ste||ar
karame_ has quit [Remote host closed the connection]
Nikunj__ has quit [Ping timeout: 260 seconds]
nikunj has quit [Read error: Connection reset by peer]
nikunj has joined #ste||ar
nikunj has quit [Read error: Connection reset by peer]
nikunj has joined #ste||ar
Nikunj__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 264 seconds]
nikunj97 has joined #ste||ar
Nikunj__ has quit [Ping timeout: 256 seconds]
Nikunj__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 256 seconds]
nikunj97 has joined #ste||ar
Nikunj__ has quit [Ping timeout: 265 seconds]
<nikunj97> simbergm, --hpx:papi-event-info=all doesn't work when installing HPX with PAPI. It says: runtime_support::load_components: command line processing: unrecognised option '--hpx:papi-event-info=all'
Nikunj__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 256 seconds]
rtohid has joined #ste||ar
<weilewei> nikunj simbergm wow, I was just thinking to use GPTL and/or pipi to profile my mpi code
<Nikunj__> weilewei, are you facing any similar issues?
<Nikunj__> list-counters won't show papi counters for some reason
<weilewei> Nikunj__ no clue actually, I just start looking at gptl page, like day 1 knowing this library, never use
<weilewei> I will ask you questions maybe later :)
<Nikunj__> weilewei, I'm not an expert either. Have looked around a bit about APEX and PAPI, that's all ;)
nikunj has quit [Read error: Connection reset by peer]
nikunj has joined #ste||ar
<hkaiser> Nikunj__: there was ticket about that
<hkaiser> I think
nikunj97 has joined #ste||ar
<hkaiser> Nikunj__: is the plugin being built?
<hkaiser> also has it been loaded (see logs)
<nikunj97> hkaiser, you mean PAPI?
<nikunj97> PAPI is loaded on the system
<hkaiser> no, the hpx papi plugin
<nikunj97> ohh no, that's not built
<hkaiser> should be in <build>/bin/hpx
<heller1> hkaiser: on linux under lib/hpx
<hkaiser> ok
<hkaiser> thanks
Nikunj__ has quit [Ping timeout: 256 seconds]
<nikunj97> I do see libhpx_papi_counters.so.1.5.0
<nikunj97> under lib64/hpx
<hkaiser> ok
<hkaiser> that's the one
<nikunj97> so it's been built. How do I load it?
<hkaiser> it should be found automatically, see logs
<hkaiser> --hpx:debug-hpx-log=<file> or somesuch
nikunj has quit [Ping timeout: 256 seconds]
<nikunj97> hkaiser, ldd doesn't show libhpx_papi_counters
<heller1> of course
<hkaiser> yes, that's correct
<hkaiser> it's runtime-loaded
<heller1> it is a plugin that will be loaded at runtime
<nikunj97> aah I see
<heller1> hmmm, i the path maybe incorrect when starting off of an installed HPX?
<heller1> weilewei: we don't have GPTL support in HPX
<hkaiser> heller1: should look in its relative ./hpx
<hkaiser> why do we need support for GPTL?
<heller1> no idea
<heller1> i thought it was some replacement for PAPI
<nikunj97> hkaiser, alright, I got my debug info into a file. What now?
<weilewei> heller1 I see, I try to profile my mpi code with gptl, no hpx involved for now
<weilewei> what's difference using papi and gptl?
<heller1> two completely different things, as it seems
<hkaiser> nikunj97: look at the file's content?
<hkaiser> grep for papi
<weilewei> the goal is to get mpi calls statistics, like how much time spent on mpi_wait, mpi_isend
<heller1> GPTL is a library to instrument C, C++, and Fortran codes for performance analysis and profiling.
<heller1> PAPI provides the tool designer and application engineer with a consistent interface and methodology for use of the performance counter hardware found in most major microprocessors.
<heller1> well, then you should use GPTL, or similar profilers
nikunj has joined #ste||ar
<nikunj97> hkaiser, grep papi shows nothing
<weilewei> ok, thanks, heller1 I am trying to use gptl on Summit
<heller1> weilewei: or whatever MPI profiling they suggest and document
<weilewei> another suggestion was using mpiP
<hkaiser> nikunj97: are any 'module loaded' messges (or similar) in the log?
<hkaiser> should be right in the beginning
<heller1> weilewei: or APEX, with HPX :P
<hkaiser> heller1: APEX doesn't look at networking
<weilewei> IC
<hkaiser> nikunj97: is that an installed HPX ?
<nikunj97> yes, that's an installed hpx
<heller1> hkaiser: no, but it looks at the tasks
<hkaiser> right
nikunj has quit [Ping timeout: 260 seconds]
<nikunj97> hkaiser, hpx: couldn't find module in global static command line data map
<nikunj97> that's all there is wrt module
<hkaiser> for what module?
<hkaiser> come on, just read the first 100-200 messages
<nikunj97> this module portion came up on line 29595 ;)
<hkaiser> whatever, then read first 50000 ines ;-)
nikunj has joined #ste||ar
<nikunj97> No plugins found/loaded
<nikunj97> hkaiser, I guess you were looking for this ^^
<hkaiser> nikunj97: it should list the .so files it attempts to load and whether it successfully load them
<hkaiser> nikunj97: right, does it give you a prefix (root) directory?
<hkaiser> I don't remember what it actually prints
<hkaiser> what does --hpx:info print?
<hkaiser> it should list a prefix (configured) and a real one
<nikunj97> HPX_PREFIX=/home/jusers/gupta2/juawei/install/arm/hpx_papi
<hkaiser> also, do a ldd on the libhpx_papi module
<hkaiser> is that where the hpx core library is?
<nikunj97> yes, that's my papi build of hpx
<nikunj97> ldd on libhpx_papi shows up all the paths correctly
<hkaiser> ok, that needs better investigation, then
<nikunj97> do you want me to open up a ticket?
<hkaiser> #4123 is still open, I thought that was fixed
<nikunj97> hkaiser, aah! that explains it then
<hkaiser> a couple of additional options to diagnose things
<weilewei> for gptl: https://github.com/jmrosinski/GPTL, there is no ./configure executable, how to install this tool?
<nikunj97> hkaiser, prefix looks exactly the same as defined by simbergm in his comment
<hkaiser> ok, so it needs some debugging
nikunj has quit [Ping timeout: 256 seconds]
<nikunj97> i guess so
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 256 seconds]
<weilewei> oh, i need to run autoconf first
<heller1> nikunj97: autogen
<heller1> well autoreconf, I guess
<weilewei> yes, turns out autoreconf -i
<bita> hkaiser, occasionally we see error on remote_run_test (https://circleci.com/gh/STEllAR-GROUP/phylanx/36376) Do you think it's the same thing as #1120?
<hkaiser> sec, in a meeting
<hkaiser> we've got 5 GSoC students approved
<heller1> wee
<heller1> that's cool
<hkaiser> bita: looks like the values don't check out
<hkaiser> one value in the result array is twice as large (or half the value) as it should be
<bita> I ran the test again and it passed
<hkaiser> darn
nikunj has joined #ste||ar
<hkaiser> bita: is it using the same values each run or is it using random numbers
<bita> I have seen this before, too. It might be the same problem as #1120
<bita> I will check that (I have not written that test) give me a sec
<hkaiser> it's a different error, isn't it?
nikunj has quit [Ping timeout: 260 seconds]
<bita> yes, it's a different error
<bita> in remote_add test we make two random matrices, once add them locally which create the "expected" and once add them remotely "result". It says result and expected don't have the same vakues
akheir has joined #ste||ar
nikunj has joined #ste||ar
<bita> I just ran it 100 tiles on docker and I didn't see any failure. But I see that error twice on circleCI
<hkaiser> bita: we should start printing the used seed to be able to reproduce random (hah! pun intended) errors
<bita> :)) :+1
Nikunj__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 256 seconds]
nikunj97 has joined #ste||ar
weilewei86 has joined #ste||ar
Nikunj__ has quit [Ping timeout: 265 seconds]
nikunj has quit [Ping timeout: 240 seconds]
rtohid has quit [Ping timeout: 240 seconds]
weilewei has quit [Ping timeout: 240 seconds]
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 256 seconds]
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 250 seconds]
nan1138 has joined #ste||ar
nikunj has joined #ste||ar
weilewei86 has quit [Ping timeout: 240 seconds]
nan11 has quit [Ping timeout: 240 seconds]
Nikunj__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 256 seconds]
nikunj has quit [Ping timeout: 256 seconds]
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 240 seconds]
nikunj has joined #ste||ar
weilewei has joined #ste||ar
nikunj has quit [Ping timeout: 264 seconds]
nan1138 has quit [Remote host closed the connection]
nan77 has joined #ste||ar
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 256 seconds]
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 256 seconds]
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 240 seconds]
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 256 seconds]
nikunj has joined #ste||ar
Nikunj__ has quit [Read error: Connection reset by peer]
<weilewei> Ha, after one day of learning gptl, finally get it work on dca
<weilewei> next step profile it...