<rori>
Hey is anyone using the `HPX_WITH_VIM_YCM` successfully ?
<rori>
whereas I verified that the corresponding directories are in the compile_commands.json
<rori>
I enabled the option for the configure step, did a `make configure_ycm` to copy the configuration file in the source dir and added the `let g:ycm_extra_conf_globlist = ['<path_to_my_project>/*'] ` to my `.vimrc` but I still have some header not found errors
<hkaiser>
rori: I've never used this feature
<rori>
ok ^^ thanks !
karame_ has joined #ste||ar
<nan11>
Is Avah in irc?
nikunj97 has quit [Quit: Leaving]
gonidelis has joined #ste||ar
<hkaiser>
nan11: don't think so
<nan11>
Okay
<hkaiser>
Amy1: use std::experimental::simd for your delta_pos and pos arrrays
<simbergm>
hkaiser: I wasn't, but that looks interesting as well...
weilewei has joined #ste||ar
<hkaiser>
could be a c++ standards issue
<simbergm>
ah, but it's probably the same issue
<hkaiser>
nod
<simbergm>
clang defaults to something really low whereas gcc defaults to 14
<hkaiser>
right
<simbergm>
so yeah, that should go away
weilewei has quit [Remote host closed the connection]
<hkaiser>
simbergm: I'll rebase John's PRs
<hkaiser>
simbergm: btw, did you ever merge your fix for the -1 index issue in one of the schedulers?
<simbergm>
hkaiser: sure, pycicle will automatically pick up the changes on master though (in case you want to save yourself some work)
weilewei has joined #ste||ar
<simbergm>
hmm, the one on the exception pr?
<hkaiser>
no
<simbergm>
or something else?
<hkaiser>
the local_thread_num issue
<simbergm>
the exception pr is merged
<simbergm>
well, there was a fix for the local thread num -1 issue on that pr, but I still have to go and check that all the other schedulers get the local thread num set
<hkaiser>
ok, then one of the problems on the APEX PR should go away
<weilewei>
hkaiser now the error of G4 array between GPUDirect and baseline is down to 5^e-15, very acceptable range. So my logic is correct now, and gonna run more experiments to verify my implementation. The bigger error before was due to my ignorance on data processing tools, like hdf5 and python interface for complex number... learned something new
<simbergm>
hkaiser: those might require the proper fix
<simbergm>
I can have a look in any case
<simbergm>
(the exception pr went in yesterday and the apex failures are still there from today)
<hkaiser>
simbergm: ok, thanks
<hkaiser>
weilewei: nice
<weilewei>
hkaiser do we happen to have access to any AMD GPU?
<hkaiser>
weilewei: we could ask Adrian ;-)
<weilewei>
hkaiser lol, true, just checking
<hkaiser>
weilewei: at some point we did have AMD GPUs in rostam, not sure if that's still the case
<weilewei>
hkaiser ok, maybe Ali knows
<hkaiser>
pls ask him
<Yorlik>
hkaiser: YT?
<hkaiser>
here
<hkaiser>
half-way
<Yorlik>
I did some thinking about this specific tree structure of a 64 ary tree
<Yorlik>
Problems and chances of it
<Yorlik>
Also a possible generalization and why Morton Code actually might be the answer
<Yorlik>
I first thought the Morton Codes are only useful to balance a newly constructed tree
<Yorlik>
But their power is much more
<Yorlik>
If you say N is a number of components you combine in a morton code
<Yorlik>
And you want to have this N Dimensional tree type
<Yorlik>
You end up with 2^N possible coordinates / child nodes
<Yorlik>
So - you enter dimensional explosion very quickly.
<Yorlik>
If the bitwise comparison of coordinates in such a tree is used to calculate the numbver in the array of child nodes, this number encodes all the comparisons you made
<hkaiser>
ok
<hkaiser>
optimizing again?
<Yorlik>
isten first
<Yorlik>
L
<Yorlik>
If you write a Morton Code as a number of base 2^N the difits represent the subsector indives through the tree - the entore path is encoded
<Yorlik>
Sry - my typing is horrible.
<Yorlik>
If you want to avoid to chase pointers at every level, because you cannot have an array for all coordinates The Morton Code solves that
<Yorlik>
You avoid the dimensional explosion
<hkaiser>
k
<simbergm>
weilewei: did you get any reaction from john yesterday? he didn't reply to me either when I asked about it, but I can poke him about it again
<Yorlik>
If you have like a 64 bit tree the amount of storage exlodes - so you need a sparse structure in any case.
<Yorlik>
Just storing the coordinates becomes a problem.
<weilewei>
simbergm unfortunately no response from John yesterday
<Yorlik>
Also - traversing a tree is probably less efficion than quickly calculating a Morton Code using intrinsics/SIMD
<Yorlik>
And then looking it up in a skip list
<weilewei>
simbergm I guess he is ignoring, never mind then.
<Yorlik>
OFC I'd have to measure, but the real problem is the consequences of dimensional explosion as you add coordinates.
<simbergm>
weilewei: I don't think so, he can just be a bit distracted sometimes
<weilewei>
simbergm thanks, if convenient, please poke him again. Thanks!
<nikunj97>
should I just replace apex with HPX everywhere?
<hkaiser>
nikunj97: not everywhere
<hkaiser>
only in one spot
<simbergm>
hrm
<hkaiser>
am I right?
<nikunj97>
let me just recompile it with the PR branch :/
<hkaiser>
sorry, if I misunderstand things
<simbergm>
sorry, yes, just one spot!
<simbergm>
thinking about something else
<nikunj97>
where do I make the change then?
<hkaiser>
nikunj97: it's in the PR
<simbergm>
since apex was built there should be the exported target in HPXModuleTargets.cmake
<simbergm>
but if linking to HPX::apex didn't work it's not going to work by changing it there either...
<simbergm>
I'll do a bit of digging
<nikunj97>
let me just recompile in that case
<simbergm>
if you don't mind potentially doing it again... I just want to check that my fix actually is correct (I admit I didn't test it)
<nikunj97>
simbergm, sure will do
<nikunj97>
will comment on the PR if things work
<simbergm>
ah, thanks
<simbergm>
(I did actually mean that I'll try it out myself, but if you don't mind trying I'm very happy as well)
<gonidelis>
Procedural-oriented question: When I have a github project cloned into my PC (say HPX) after executing some compilations there are produced certain new compilation files on the directory. So what happens after I make some changes and want to push them back. What is a standard procedure with which I can push my new code explicitly and not the
<gonidelis>
Is it the project's business to produce the compilation files on an external dir or is it my business to keep a copy of the original project ?
<simbergm>
gonidelis: no compiled artifacts go into the the git repository since they can 1) be rebuilt from the source files, 2) they are machine specific, 3) they can be big(!), etc.
<simbergm>
there are probably hundreds of reasons not to add compiled files
<simbergm>
hpx does not even allow building in the source directory which makes it a bit easier to not accidentally check in anything from the build directory
<simbergm>
if you have your build directory completely outside of the source directory git won't even let you add them, and if you have a "build" subdirectory in your source directory we have a gitignore rule to ignore any files in that directory
<gonidelis>
thank you. You answered perfectly to all of my questions. I 'll keep your words in mind...
<nikunj97>
simbergm, there's no apex in HPXTargets.cmake
<nikunj97>
do I add that at the end?
<simbergm>
nikunj97: no, that means apex is not (correctly?) enabled
<simbergm>
adding it manually is a bad idea
<simbergm>
so you're on 428f0ad5f31 (msimberg-patch-5), now?
<nikunj97>
I used -DHPX_WITH_APEX=ON -DAPEX_WITH_OTF2
<nikunj97>
simbergm, I haven't built it yet
<nikunj97>
I thought, you wanted to make changes to the current build
<simbergm>
nikunj97: I was hoping you could, but that's not going to work if apex wasn't enabled in the first place
<nikunj97>
how do I enable it?
<simbergm>
so in your build directory, you definitely have HPX_WITH_APEX=ON (check CMakeCache.txt or ccmake, whatever you prefer)
<nikunj97>
did I do something wrong with the build?
<simbergm>
`-DHPX_WITH_APEX=ON` is correct, I'm just being thorough :)
<simbergm>
don't worry
<nikunj97>
HPX_WITH_APEX ON
<nikunj97>
from ccmake
<simbergm>
good
<simbergm>
then, in your build directory in lib/cmake/HPX/HPXTargets.cmake do you have any mention of HPX::hpx?
<simbergm>
or did you already check the build directory earlier?
<simbergm>
looks correct (although I recommend you don't use the global `include_directories` and friends commands, but that's unrelated; HPX_LIBRARY_DIR and HPX_INCLUDE_DIR are empty nowadays)
<hkaiser>
bita: yt?
<simbergm>
nikunj97: I'm running out of ideas, something is not using the right paths
<hkaiser>
bita: I will be a couple of minutes late for our meeting today
<nikunj97>
simbergm, I'm currently building your new PR. Let's see how it works
<simbergm>
do you have anything in LD_LIBRARY_PATH that might make it look like it's linking to the correct one, even though it was compiled against another install?
<nikunj97>
I only have path to nsimd in LD_LIBRARY_PATH
<simbergm>
ok, not that then...
<nikunj97>
coz I'm yet to write a findNsimd to link things correctly
<bita>
hkaiser, Okay :)
<simbergm>
nikunj97: I'll be away for a bit, but ping me if things still don't work with the pr
<nikunj97>
simbergm, will do
<gonidelis>
When I call hpx::asynx on an hpx::future, is it correct to say that a thread is invoked? Or is it just sth like from a higher level architecture, say 'a future' for example?
<weilewei>
gonidelisthat I believe it will invoke a hpx user-level thread, and the task represented by the future will be executed in that new thread
<weilewei>
gonidelis ^^
gonidelis has quit [Ping timeout: 240 seconds]
<nikunj97>
simbergm, your PR worked for me!
<heller1>
gonidelis: you don't invoke async on a future
<heller1>
async returns a future
<heller1>
And yes, think in tasks, not in threads. A future represents an asynchronous result from a task
<heller1>
It does not necessarily have to be a os thread or user level thread that is carrying out the task. Could be an asynchronous copy, where some dma engine performs the work, or a network request where your network card performs the work and many more
gonidelis has joined #ste||ar
<gonidelis>
heller1 Yeah, *task* fits better as a term. Thank you...
<gonidelis>
The reason I ask is because I am writting a README on my matrix_multiplication sample program and I would like to be as accurate as possible as I would like to provide the code in public as a serious example of HPX speed-boost possibilities compared to a sequential version
<heller1>
Be careful, matrix matrix multiplication is a well researched topic. While your implementation is most likely memory bound and a O(N^3) algorithm. The best implementations and algorithms perform way better, even the sequential version ;)
nikunj97 has quit [Read error: Connection reset by peer]
<hkaiser>
yah, David did write a close to optimal mxm a while back, there should be a repository somewhere
<heller1>
Cool, I wasn't aware of that! We should promote that...
<gonidelis>
I agree. Maybe it would be better if I rephrase : "Dummy MxM multiplication". I am not trying to provide a fast MxM calculator but rather expose the time difference between a straightforward sequential MxM execution and a straightforward parallel MxM one. I would like it to be more of an exhibition example...
<heller1>
Sure
<heller1>
I really didn't mean to demotivate you... Just wanted to mention points of criticism one might have if you distribute such statements
<gonidelis>
no worries! You helped me document the goal of my project in a better manner actually ;)
<heller1>
On that note, it's really sad that no one implemented the strassen algorithm...
<heller1>
hkaiser: we should ask the students about that next year... Gives way more insight into task based programming
<gonidelis>
heller1 Do you think it would be useful if I try to implement it the next few days?
<gonidelis>
useful for the community*
<heller1>
If you have some time, go ahead. It's certainly useful for you.
<hkaiser>
gonidelis: mostly useful for you, I guess
<gonidelis>
Sure. I 'll be glad to give it a try.
nikunj97 has joined #ste||ar
<gonidelis>
Some thoughts on my current project though: I have implemented a sequential version compared to a parallel one utilizing a simple `hpx::async`. Sequential seems to work better than parallel . My guess is that because I compute each cell of the procuct matrix with a different task, overhead is manifested... Do you think making a
<gonidelis>
row-based-parallelization would improve the performance?