hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020
<hkaiser> diehlpk_mobile[m: yt?
<diehlpk_work_> hkaiser, yes
<hkaiser> diehlpk_work_: see pm, pls
Amy1 has quit [Ping timeout: 260 seconds]
hkaiser has quit [Quit: bye]
bita has quit [Quit: Leaving]
hkaiser has joined #ste||ar
nikunj97 has joined #ste||ar
<zao> Timestep 106.00 terminate called after throwing an instance of 'thrust::system::system_error'
<zao> what(): after reduction step 2: cudaErrorInvalidConfiguration: invalid configuration argument
<zao> Don't you love mysterious crashes hours into a run on a cluster? :D
<zao> Hah, had run a zero-size kernel and it blew up a few functions later.
<heller1> that's cuda error handling for you ;)
Vir has quit [Ping timeout: 256 seconds]
Vir has joined #ste||ar
Vir has joined #ste||ar
Vir has quit [Changing host]
karame_ has quit [Remote host closed the connection]
nikunj97 has quit [Ping timeout: 260 seconds]
kale_ has joined #ste||ar
kale_ has quit [Quit: Konversation terminated!]
kale_ has joined #ste||ar
kale_ has quit [Ping timeout: 260 seconds]
kale_ has joined #ste||ar
mcopik has joined #ste||ar
mcopik has quit [Client Quit]
kale_ has quit [Client Quit]
kale_ has joined #ste||ar
kale_ has quit [Ping timeout: 265 seconds]
Amy1 has joined #ste||ar
hkaiser_ has joined #ste||ar
hkaiser has quit [Ping timeout: 260 seconds]
hkaiser_ has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
<hkaiser> jaafar_: should we merge wave to master for Boost 1.73?
hkaiser has quit [Ping timeout: 260 seconds]
hkaiser has joined #ste||ar
mcopik has joined #ste||ar
Amy1 has quit [Quit: WeeChat 2.2]
mcopik has quit [Remote host closed the connection]
mcopik has joined #ste||ar
mcopik has quit [Client Quit]
nikunj97 has joined #ste||ar
bita has joined #ste||ar
nan11 has joined #ste||ar
Amy1 has joined #ste||ar
<Amy1> how to optimize this code using simd?
<rori> Hey is anyone using the `HPX_WITH_VIM_YCM` successfully ?
<rori> whereas I verified that the corresponding directories are in the compile_commands.json
<rori> I enabled the option for the configure step, did a `make configure_ycm` to copy the configuration file in the source dir and added the `let g:ycm_extra_conf_globlist = ['<path_to_my_project>/*'] ` to my `.vimrc` but I still have some header not found errors
<hkaiser> rori: I've never used this feature
<rori> ok ^^ thanks !
karame_ has joined #ste||ar
<nan11> Is Avah in irc?
nikunj97 has quit [Quit: Leaving]
gonidelis has joined #ste||ar
<hkaiser> nan11: don't think so
<nan11> Okay
<hkaiser> Amy1: use std::experimental::simd for your delta_pos and pos arrrays
<simbergm> hkaiser, others, sorry about all the additional failures on pycicle
<hkaiser> simbergm: no worries
<simbergm> some are due to me enabling the build unit tests, which is why I merged the pr fixing that
<simbergm> not sure if there's something else going on
<hkaiser> thanks for taking care of things!
<simbergm> if things are ok it should stabilize by tomorrow
nikunj97 has joined #ste||ar
<simbergm> hkaiser: I wasn't, but that looks interesting as well...
weilewei has joined #ste||ar
<hkaiser> could be a c++ standards issue
<simbergm> ah, but it's probably the same issue
<hkaiser> nod
<simbergm> clang defaults to something really low whereas gcc defaults to 14
<hkaiser> right
<simbergm> so yeah, that should go away
weilewei has quit [Remote host closed the connection]
<hkaiser> simbergm: I'll rebase John's PRs
<hkaiser> simbergm: btw, did you ever merge your fix for the -1 index issue in one of the schedulers?
<simbergm> hkaiser: sure, pycicle will automatically pick up the changes on master though (in case you want to save yourself some work)
weilewei has joined #ste||ar
<simbergm> hmm, the one on the exception pr?
<hkaiser> no
<simbergm> or something else?
<hkaiser> the local_thread_num issue
<simbergm> the exception pr is merged
<simbergm> well, there was a fix for the local thread num -1 issue on that pr, but I still have to go and check that all the other schedulers get the local thread num set
<hkaiser> ok, then one of the problems on the APEX PR should go away
<weilewei> hkaiser now the error of G4 array between GPUDirect and baseline is down to 5^e-15, very acceptable range. So my logic is correct now, and gonna run more experiments to verify my implementation. The bigger error before was due to my ignorance on data processing tools, like hdf5 and python interface for complex number... learned something new
<simbergm> hkaiser: those might require the proper fix
<simbergm> I can have a look in any case
<simbergm> (the exception pr went in yesterday and the apex failures are still there from today)
<hkaiser> simbergm: ok, thanks
<hkaiser> weilewei: nice
<weilewei> hkaiser do we happen to have access to any AMD GPU?
<hkaiser> weilewei: we could ask Adrian ;-)
<weilewei> hkaiser lol, true, just checking
<hkaiser> weilewei: at some point we did have AMD GPUs in rostam, not sure if that's still the case
<weilewei> hkaiser ok, maybe Ali knows
<hkaiser> pls ask him
<Yorlik> hkaiser: YT?
<hkaiser> here
<hkaiser> half-way
<Yorlik> I did some thinking about this specific tree structure of a 64 ary tree
<Yorlik> Problems and chances of it
<Yorlik> Also a possible generalization and why Morton Code actually might be the answer
<Yorlik> I first thought the Morton Codes are only useful to balance a newly constructed tree
<Yorlik> But their power is much more
<Yorlik> If you say N is a number of components you combine in a morton code
<Yorlik> And you want to have this N Dimensional tree type
<Yorlik> You end up with 2^N possible coordinates / child nodes
<Yorlik> So - you enter dimensional explosion very quickly.
<Yorlik> If the bitwise comparison of coordinates in such a tree is used to calculate the numbver in the array of child nodes, this number encodes all the comparisons you made
<hkaiser> ok
<hkaiser> optimizing again?
<Yorlik> isten first
<Yorlik> L
<Yorlik> If you write a Morton Code as a number of base 2^N the difits represent the subsector indives through the tree - the entore path is encoded
<Yorlik> Sry - my typing is horrible.
<Yorlik> If you want to avoid to chase pointers at every level, because you cannot have an array for all coordinates The Morton Code solves that
<Yorlik> You avoid the dimensional explosion
<hkaiser> k
<simbergm> weilewei: did you get any reaction from john yesterday? he didn't reply to me either when I asked about it, but I can poke him about it again
<Yorlik> If you have like a 64 bit tree the amount of storage exlodes - so you need a sparse structure in any case.
<Yorlik> Just storing the coordinates becomes a problem.
<weilewei> simbergm unfortunately no response from John yesterday
<Yorlik> Also - traversing a tree is probably less efficion than quickly calculating a Morton Code using intrinsics/SIMD
<Yorlik> And then looking it up in a skip list
<weilewei> simbergm I guess he is ignoring, never mind then.
<Yorlik> OFC I'd have to measure, but the real problem is the consequences of dimensional explosion as you add coordinates.
<simbergm> weilewei: I don't think so, he can just be a bit distracted sometimes
<weilewei> simbergm thanks, if convenient, please poke him again. Thanks!
jaafar_ is now known as jaafar
akheir has joined #ste||ar
<nikunj97> simbergm, yt?
<simbergm> nikunj97: here
<nikunj97> simbergm, I built hpx with apex and otf2. I also did export the required (https://github.com/STEllAR-GROUP/tutorials/tree/master/cscs2019/session3#apex-trace-output)
<nikunj97> how do I get the trace output from an executable from here?
<nikunj97> running the executable does not do anything
<simbergm> :(
<nikunj97> is there anything special that I need to do?
<simbergm> what does ldd tell you for your executable?
<simbergm> you're not supposed to need to do anything special
<simbergm> I hope you didn't end up with a commit just when I managed to break apex linking again...
<simbergm> can you give me hpx's commit hash as well
<simbergm> sorry if I've made you build a broken version
<nikunj97> hpx commit: 969833a
<nikunj97> simbergm, :(
<simbergm> yep, no apex there
<simbergm> do you have libhpx_apex in your hpx build directory?
<nikunj97> yes libhpx_apex exists
<simbergm> nikunj97: you'll need this: https://github.com/STEllAR-GROUP/hpx/pull/4510
<simbergm> (I'm mad at cmake for letting me make that mistake, but we also need better testing...)
<simbergm> you can also try linking to `HPX::apex` in your application
<nikunj97> how do I do that?
<nikunj97> I do not want to rebuild everything
<simbergm> I think that should give you the same effect if you don't want to rebuild hpx, but don't rely on it in the future
<simbergm> target_link_libraries(myapp PRIVATE HPX::apex)
<nikunj97> I add that to my CMakeLists.txt on the application I'm working on?
<simbergm> yeah
<nikunj97> let me try it
<simbergm> it's meant to automatically be linked through HPX::hpx's interface link libraries, but it's not because of my mistake up there
<simbergm> nikunj97: mind grepping for "apex" in the hpx install directory/lib/cmake/HPX? potentially lib64
<simbergm> I'm going off memory here so I might get some things wrong
<simbergm> you might end up recompiling anyway :P
<nikunj97> it's APEX::apex everywhere
<nikunj97> should I just replace apex with HPX everywhere?
<hkaiser> nikunj97: not everywhere
<hkaiser> only in one spot
<simbergm> hrm
<hkaiser> am I right?
<nikunj97> let me just recompile it with the PR branch :/
<hkaiser> sorry, if I misunderstand things
<simbergm> sorry, yes, just one spot!
<simbergm> thinking about something else
<nikunj97> where do I make the change then?
<hkaiser> nikunj97: it's in the PR
<simbergm> since apex was built there should be the exported target in HPXModuleTargets.cmake
<simbergm> but if linking to HPX::apex didn't work it's not going to work by changing it there either...
<simbergm> I'll do a bit of digging
<nikunj97> let me just recompile in that case
<simbergm> if you don't mind potentially doing it again... I just want to check that my fix actually is correct (I admit I didn't test it)
<nikunj97> simbergm, sure will do
<nikunj97> will comment on the PR if things work
<simbergm> ah, thanks
<simbergm> (I did actually mean that I'll try it out myself, but if you don't mind trying I'm very happy as well)
<gonidelis> Procedural-oriented question: When I have a github project cloned into my PC (say HPX) after executing some compilations there are produced certain new compilation files on the directory. So what happens after I make some changes and want to push them back. What is a standard procedure with which I can push my new code explicitly and not the
<gonidelis> compilation files?
<simbergm> nikunj97: even without those changes you should have something like the following in lib64/cmake/HPX/HPXTargets.cmake: https://gist.github.com/msimberg/d8fd7467149175028e2519487e47a62e
<simbergm> mind checking?
<gonidelis> Is it the project's business to produce the compilation files on an external dir or is it my business to keep a copy of the original project ?
<simbergm> gonidelis: no compiled artifacts go into the the git repository since they can 1) be rebuilt from the source files, 2) they are machine specific, 3) they can be big(!), etc.
<simbergm> there are probably hundreds of reasons not to add compiled files
<simbergm> hpx does not even allow building in the source directory which makes it a bit easier to not accidentally check in anything from the build directory
<simbergm> if you have your build directory completely outside of the source directory git won't even let you add them, and if you have a "build" subdirectory in your source directory we have a gitignore rule to ignore any files in that directory
<gonidelis> thank you. You answered perfectly to all of my questions. I 'll keep your words in mind...
<nikunj97> simbergm, there's no apex in HPXTargets.cmake
<nikunj97> do I add that at the end?
<simbergm> nikunj97: no, that means apex is not (correctly?) enabled
<simbergm> adding it manually is a bad idea
<simbergm> so you're on 428f0ad5f31 (msimberg-patch-5), now?
<nikunj97> I used -DHPX_WITH_APEX=ON -DAPEX_WITH_OTF2
<nikunj97> simbergm, I haven't built it yet
<nikunj97> I thought, you wanted to make changes to the current build
<simbergm> nikunj97: I was hoping you could, but that's not going to work if apex wasn't enabled in the first place
<nikunj97> how do I enable it?
<simbergm> so in your build directory, you definitely have HPX_WITH_APEX=ON (check CMakeCache.txt or ccmake, whatever you prefer)
<nikunj97> did I do something wrong with the build?
<simbergm> `-DHPX_WITH_APEX=ON` is correct, I'm just being thorough :)
<simbergm> don't worry
<nikunj97> HPX_WITH_APEX ON
<nikunj97> from ccmake
<simbergm> good
<simbergm> then, in your build directory in lib/cmake/HPX/HPXTargets.cmake do you have any mention of HPX::hpx?
<simbergm> or did you already check the build directory earlier?
<simbergm> sorry HPX::apex
<simbergm> that looks better
<nikunj97> HPXTargets.cmake does have HPX::apex in it
<simbergm> and did you install that build?
<nikunj97> yes
<simbergm> and there's no HPX::apex in the install directory?
<nikunj97> this was the build I installed
gonidelis has quit [Remote host closed the connection]
<nikunj97> there was HPX::apex in there as well
<simbergm> there was HPX::apex in the install directory?
<nikunj97> yes
<simbergm> and did you check that your benchmark application is pointing to the correct install?
<nikunj97> but it was not showing up in ldd
<nikunj97> yes
<nikunj97> ldd does take in the right libhpx.so
<nikunj97> libhpx.so.1 => /home/jusers/gupta2/juawei/install/arm/hpx_trace/lib64/libhpx.so.1 (0x0000ffff8a8d0000)
<nikunj97> I'm trying out your PR right now if that changes things
<simbergm> can you show your cmakelists.txt where you linked to HPX::apex?
<simbergm> or try out the pr
<nikunj97> it looks like this
gonidelis has joined #ste||ar
<simbergm> thanks
<simbergm> looks correct (although I recommend you don't use the global `include_directories` and friends commands, but that's unrelated; HPX_LIBRARY_DIR and HPX_INCLUDE_DIR are empty nowadays)
<hkaiser> bita: yt?
<simbergm> nikunj97: I'm running out of ideas, something is not using the right paths
<hkaiser> bita: I will be a couple of minutes late for our meeting today
<nikunj97> simbergm, I'm currently building your new PR. Let's see how it works
<simbergm> do you have anything in LD_LIBRARY_PATH that might make it look like it's linking to the correct one, even though it was compiled against another install?
<nikunj97> I only have path to nsimd in LD_LIBRARY_PATH
<simbergm> ok, not that then...
<nikunj97> coz I'm yet to write a findNsimd to link things correctly
<bita> hkaiser, Okay :)
<simbergm> nikunj97: I'll be away for a bit, but ping me if things still don't work with the pr
<nikunj97> simbergm, will do
<gonidelis> When I call hpx::asynx on an hpx::future, is it correct to say that a thread is invoked? Or is it just sth like from a higher level architecture, say 'a future' for example?
<weilewei> gonidelisthat I believe it will invoke a hpx user-level thread, and the task represented by the future will be executed in that new thread
<weilewei> gonidelis ^^
gonidelis has quit [Ping timeout: 240 seconds]
<nikunj97> simbergm, your PR worked for me!
<heller1> gonidelis: you don't invoke async on a future
<heller1> async returns a future
<heller1> And yes, think in tasks, not in threads. A future represents an asynchronous result from a task
<heller1> It does not necessarily have to be a os thread or user level thread that is carrying out the task. Could be an asynchronous copy, where some dma engine performs the work, or a network request where your network card performs the work and many more
gonidelis has joined #ste||ar
<gonidelis> heller1 Yeah, *task* fits better as a term. Thank you...
<gonidelis> The reason I ask is because I am writting a README on my matrix_multiplication sample program and I would like to be as accurate as possible as I would like to provide the code in public as a serious example of HPX speed-boost possibilities compared to a sequential version
<heller1> Be careful, matrix matrix multiplication is a well researched topic. While your implementation is most likely memory bound and a O(N^3) algorithm. The best implementations and algorithms perform way better, even the sequential version ;)
nikunj97 has quit [Read error: Connection reset by peer]
<hkaiser> yah, David did write a close to optimal mxm a while back, there should be a repository somewhere
<heller1> Cool, I wasn't aware of that! We should promote that...
<gonidelis> I agree. Maybe it would be better if I rephrase : "Dummy MxM multiplication". I am not trying to provide a fast MxM calculator but rather expose the time difference between a straightforward sequential MxM execution and a straightforward parallel MxM one. I would like it to be more of an exhibition example...
<heller1> Sure
<heller1> I really didn't mean to demotivate you... Just wanted to mention points of criticism one might have if you distribute such statements
<gonidelis> no worries! You helped me document the goal of my project in a better manner actually ;)
<heller1> On that note, it's really sad that no one implemented the strassen algorithm...
<heller1> hkaiser: we should ask the students about that next year... Gives way more insight into task based programming
<gonidelis> heller1 Do you think it would be useful if I try to implement it the next few days?
<gonidelis> useful for the community*
<heller1> If you have some time, go ahead. It's certainly useful for you.
<hkaiser> gonidelis: mostly useful for you, I guess
<gonidelis> Sure. I 'll be glad to give it a try.
nikunj97 has joined #ste||ar
<gonidelis> Some thoughts on my current project though: I have implemented a sequential version compared to a parallel one utilizing a simple `hpx::async`. Sequential seems to work better than parallel . My guess is that because I compute each cell of the procuct matrix with a different task, overhead is manifested... Do you think making a
<gonidelis> row-based-parallelization would improve the performance?
<gonidelis> https://github.com/gonidelis/HPX_matrix_multiplication You can see the project here
<gonidelis> You can find some of my comments on the implementation there too...
<simbergm> nikunj97 excellent! thanks for your patience with this :)
<simbergm> I'm still confused why my attempt at a temporary hack didn't work but it was a hack anyway...
<nikunj97> simbergm, idk. I'm glad the PR is working
<simbergm> Yeah, that's the important thing :)
<hkaiser> simbergm: things seem to work for khuck as well - thanks a lot!
<hkaiser> he asks when we will merge things ;-)
<simbergm> hkaiser: good! it was a silly bug in the first place though...
<hkaiser> aren't all bugs silly?
<simbergm> I think we can merge it right away
<hkaiser> nod, pls go ahead
<simbergm> fair enough :) this one was sillier than most bugs...
<simbergm> I'll merge it
<simbergm> done
<simbergm> it might actually be that our apex linking is also broken with pkgconfig... will have to look into that as well
<hkaiser> simbergm: thanks!
<diehlpk_work_> I like the new rostam, because we have really low job numbers
<hkaiser> diehlpk_work_: enjoy it while buildbot is down ;-)
<diehlpk_work_> Yes, all my jobs are so fast running and no queue time
weilewei has quit [Ping timeout: 240 seconds]
nikunj has quit [Read error: Connection reset by peer]
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 240 seconds]
nikunj has joined #ste||ar
nikunj97 has quit [Read error: Connection reset by peer]
gonidelis has quit [Ping timeout: 240 seconds]
nan11 has quit [Remote host closed the connection]
wate123_Jun has joined #ste||ar
<Yorlik> hkaiser: YT?
<hkaiser> Yorlik: here
<Yorlik> Heyll!
<Yorlik> Once you can afford a little time I'd like to discuss this quadtree / Z-curve problem again.
<hkaiser> Yorlik: let's not do it today, if you wouldn't mind
<Yorlik> NP - Just generally asking.
<Yorlik> It totally has time.
<hkaiser> sure, over the weekend should be fine