hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
eschnett_ has joined #ste||ar
diehlpk_work has quit [Remote host closed the connection]
<Yorlik> Hmm ... with a lot of object migration for load balancing and other memory heavy operations I wonder if this could be interesting when running HPX apps: https://arxiv.org/pdf/1902.04738.pdf https://arxiv.org/pdf/1902.04738.pdf
<hkaiser> Yorlik: interesting
nikunj has quit [Ping timeout: 258 seconds]
nikunj has joined #ste||ar
hkaiser has quit [Quit: bye]
nikunj has quit [Remote host closed the connection]
nikunj has joined #ste||ar
nikunj has quit [Ping timeout: 245 seconds]
daissgr has joined #ste||ar
hello has joined #ste||ar
daissgr has quit [Ping timeout: 250 seconds]
daissgr has joined #ste||ar
hkaiser has joined #ste||ar
<hkaiser> heller_: isn't it your birthday today?
<heller_> hkaiser: it is
<hkaiser> Happy Birthday!
<heller_> thanks ;)
pmikolajczyk41 has joined #ste||ar
<heller_> hkaiser: does my cuda commit work on windows?
<hkaiser> heller_: have not tried yet
<heller_> that would be very interesting and was the main reason to commit it at this stage ;)
<pmikolajczyk41> Hello. While working on issue #3442 with a set_operation.hpp (hpx/parallel/algorithms/detail/set_operation.hpp) I got a little bit concerned with two function calls to parallel::util::partitioner<>::call(...) in lines 98 and 178.
<pmikolajczyk41> As I understand (looking at chunk_size.hpp or .*_partitioner.hpp), the 3rd argument should be the number of elements to be partitioned, but these two calls use the number of available cores instead.
<pmikolajczyk41> When I tried substituting *cores* with *len1*, I got some strange runtime errors.
<pmikolajczyk41> I think that it concerns only effectiveness, because the code works well, I mean, it gives correct answer for e.g. set difference or intersection.
<pmikolajczyk41> Of course, I could misunderstood the code, so I decided to ask whether somebody could have a look at it.
<pmikolajczyk41> exactly
<hkaiser> yes, we use whatever the executor returns as the number of processing units to use
<hkaiser> don't remember why *puzzled*
<hkaiser> pmikolajczyk41: could have been the result of some performance analysis...
adityaRakhecha has quit [Ping timeout: 256 seconds]
<Yorlik> o/
daissgr has quit [Ping timeout: 240 seconds]
pmikolajczyk41 has quit [Quit: Page closed]
<heller_> hkaiser: the problem with the cancelable action example is not that its trying to set the state of an active thread
<heller_> but the state of a thread that doesn't exist anymore...
<hkaiser> ok
<heller_> so, in the case of an interruption, I don't think one needs to set the state again if the thread is active ... or we need to add reference counting back in
hello has quit [Ping timeout: 255 seconds]
<hkaiser> heller_: is the test broken?
<heller_> the test itself is fine, I think
<hkaiser> how do we know the thread was interrupted in order not to set the state?
<heller_> we set the flag, don't we?
<heller_> and since we can only handle interruptions on interruption points...
<hkaiser> we set the flag to cause interruption, but once interrupted (exception being thrown) - how do you know then?
<heller_> we just terminate the thread, no?
<heller_> there's no need to set the state from the outside
<hkaiser> and flag any waiting threads, yes
<mdiers_> heller_: Happy Birthday. Will you be in heidelberg on wednesday and thursday?
<heller_> mdiers_: yes
<heller_> mdiers_: and thanks ;)
<heller_> hkaiser: if the thread is suspended, we need to put it into pending, for sure
<heller_> mdiers_: you as well? I'll arrive tomorrow afternoon
<hkaiser> heller_: when is your talk?
<mdiers_> heller_: yes I will arrive later tomorrow too.
<heller_> 21st at 15:00
<heller_> mdiers_: excellent! btw, did you hear back from BMBF yet?
<heller_> mdiers_: after the speakers dinner on the 19th, I'd be up for a beer!
<hkaiser> heller_: so, do you know what to change in order to fix the cancellable action test?
<heller_> hkaiser: not yet
<mdiers_> heller_: yes, today. unfortunately a cancellation :-( . gerald is just trying to figure out by phone why.
<heller_> ok
hello has joined #ste||ar
<mdiers_> heller_: unfortunately i will arive at 10:30pm.
<heller_> oh, ok
<hello> My machine has 8 ddr channels and every channel has 12.5GB bandwidth. Why I get 100GB and 300GB bandwidth when diffrent gcc compile parameters for stream benchmark?
<hkaiser> cache effects?
<hello> I am using stream benchmark
<Yorlik> Cache friendliness beats any sophisticated algorithms ;)
<heller_> Debug vs Release?
<heller_> 12.5 times 8 is 100, so the 300 get served from cache
<heller_> The most interesting part: what are the different parameters and how did you run it?
<hello> icc T=56
<hello> 100GB/s
<hello> Icc T=56, -xHost opt=auto
<hello> 398GB/s
<heller_> That doesn't make lots of sense ;)
<heller_> What's T?
heller_ has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]
heller_ has joined #ste||ar
<Yorlik> -xHost implies -O2 and uses SIMD instructions ...
<heller_> So my guess is: you are running a small array size?
<Yorlik> That ~should have visible consequences.
<Yorlik> I think if you really want to understand whats going on here you'd have to look at the assemply and compare.
<heller_> Or performance counters
<heller_> Could you please share your full command line to launch the program?
<Yorlik> Ya. It comes down to some hardcore lowlever stuff.
hkaiser has quit [Quit: bye]
<hello> Thtread
daissgr has joined #ste||ar
hkaiser has joined #ste||ar
aserio has joined #ste||ar
daissgr has quit [Ping timeout: 272 seconds]
eschnett_ has quit [Quit: eschnett_]
daissgr has joined #ste||ar
hello has quit [Quit: Going offline, see ya! (www.adiirc.com)]
akheir has quit [Remote host closed the connection]
aserio has quit [Ping timeout: 264 seconds]
akheir has joined #ste||ar
aserio has joined #ste||ar
diehlpk_work has joined #ste||ar
<diehlpk_work> simbergm, Non-standard cmake paths work now for Fedora
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 268 seconds]
aserio1 is now known as aserio
detan has joined #ste||ar
<detan> Does anyone know why the examples/compute/cuda/partitioned_vector only compiles with clang?
<K-ballo> simbergm: when you have a chance could you tell me what the status of the function serialization PR is on the pycycles
<diehlpk_work> detan, Which version of HPX are you using?
mbremer has joined #ste||ar
<detan> diehlpk_work: 1.2.0
<diehlpk_work> Is the error message related to constexp?
<detan> diehlpk_work: Sorry, I was just looking at the CMakeLists.txt file at the examples/compute/cuda/ folder.
<detan> diehlpk_work: I was just wondering if there was any feature that only works with clang. Like partitioned_vectors...
<diehlpk_work> detan, We had issued with nvcc there, but these should be addressed in current master
<diehlpk_work> I had some issues with constexp when I switched from clang to nvcc
<detan> diehlpk_work: That is good to know... Thanks!
<diehlpk_work> You could try hpx master and see if it is working there
<detan> diehlpk_work: Will do!
<diehlpk_work> detan, cuda clang handles constexpr differenty from nvcc
<diehlpk_work> *different
<detan> diehlpk_work: How different?
<diehlpk_work> In my case clang was able to compile the constexp statement and nbcc complained about it
<diehlpk_work> I had one function defined as constexp and clang could compile it, but nvcc complained
<detan> I see... did you use -cuda-relaxed-constexpr?
<diehlpk_work> Yes
<simbergm> K-ballo: one unused variable warning, some capture/move warnings, and something broken in the guided pool executor (only with gcc...?)
<simbergm> and -Werror is on, hence errors
<simbergm> first link, page 1: unused variable, page 2: guided executor
<K-ballo> thanks
<detan> diehlpk_work: It's hard to work with nvcc sometimes...
<simbergm> second link: move warnings
<K-ballo> had not realized there were pages
<simbergm> K-ballo: last commit on that branch was a week ago, no? (if yes, results should be up to date)
<diehlpk_work> detan, Yes, it is. At least my use case compiled with the master of hpx
<K-ballo> simbergm: yeah, I waited a couple days
<simbergm> you can also set "items per page = all"
<K-ballo> I'll try fixing those issues later today, and ask back later this week
<simbergm> sounds good
<simbergm> K-ballo: scratch that, those snuck into master....
<K-ballo> which ones?
<detan> diehlpk_work: Ok, thanks... I saw that the condition in CMakeLists.txt checking for clang still there for partitioned_vector on master. Let's see what happens when I remove it...
<simbergm> looks like 3704
<simbergm> master was ok yesterday
<K-ballo> function serialization is not based on debind more
<K-ballo> ...is it? it should not be
<hkaiser> what's 'debind'?
<K-ballo> oh does pycicle rebases/merge?
<K-ballo> hkaiser: the PR for removing/lowering bind and friends
<K-ballo> left some unused using placeholder::_1; in place
<hkaiser> k
<simbergm> K-ballo: yep, it merges master before testing
<simbergm> diehlpk_work: nice work on the cmake
<simbergm> what did you have to do in the end to convince cmake to look in the right places?
<diehlpk_work> Adding a new flag to the cmake and change the installation path
<diehlpk_work> Fedora searches in /usr/lib*/*mpi*/lib/
<simbergm> hmm, ok
<diehlpk_work> simbergm, I just asked in their irc and did what they told me to do
<diehlpk_work> and it worked
<diehlpk_work> I could install the rpm packages and compile the hello world with gcc, mpich, and openmpi
aserio has quit [Ping timeout: 246 seconds]
<K-ballo> simbergm: are the capture errors in master too?
<K-ballo> looks like they are
<hkaiser> K-ballo: I implemented then_execute and bulk_then_execute on your executor, should propagate the launch policy correctly now
<K-ballo> ok, let me know if I have to do anything else on that branch
<K-ballo> hkaiser: what should we do about the C++11 errors on master?
<hkaiser> will look
<hkaiser> wrt the executor - I'd appreciate it if you went over the changed test one more time
<hkaiser> what errors btw? circle is green?
<K-ballo> circle does 14 or 17, right?
<hkaiser> didn't think so, anyways - where can I see those errors?
<K-ballo> my last PR introduced some (bad in 11) uses of the capture macros along with [=] default capture
<hkaiser> darn
<hkaiser> #ifdef it?
<K-ballo> no, it has to be rewritten proper
<K-ballo> manually capturing entities is one possibility
<hkaiser> right
* K-ballo wonders if we could have an inspect check for bad uses of capture macros
<hkaiser> can you manually capture args... in 11?
<K-ballo> uhm, by value I think yes?
<hkaiser> the whole point of that dance you removed was to avoid capturing the template parameter pack
<K-ballo> which dance?
<hkaiser> the wrapping into a tuple which then was captured
<K-ballo> I think we are looking at different errors?
<K-ballo> I only see one and it comes from when_each, no vaiadics
<hkaiser> ahh, ok then
<hkaiser> I was thinking about a different capture
<K-ballo> hah, and the silliest thing is the [=] doesn't actually capture anything
<hkaiser> lol
<hkaiser> well, then the solution is easy
<K-ballo> trivial...
<hkaiser> heh, some leftover
<Yorlik> I think I'm going to love C++ templates (first time use, just now) as much as macros :)
<K-ballo> that's either wrong or wrong
<Yorlik> Something is always wrong somewhere.
<simbergm> K-ballo: you found all the error messages you need?
<K-ballo> simbergm: maybe... is it possible to stop cdash from reinterpreting `<` as html tag opening? half the templated output is missing
<K-ballo> I think it is all there, I'm just having a hard time deciphering it
<K-ballo> I'm trying to reconstitute it form the DOm tree
<simbergm> K-ballo: ugh, good question
<simbergm> jbjnr_: ?
<simbergm> I don't know if it can be changed (easily)
<K-ballo> additionally, that guided thread pool executor, is it not C++14 only? the code uses several C++14 features
<simbergm> K-ballo: yeah, it should be, might be missing some ifdefs
<simbergm> (also jbjnr_ ^)
<K-ballo> nevermind, I assumed I was still looking at C++11 failures, but I see the report mentions C++17
mdiers_ has quit [Remote host closed the connection]
<heller_> detan: there are a few hiccups with compiling in cuda mode right now...
<heller_> There's a PR which addresses this, but it's not quite ready yet...
aserio has joined #ste||ar
daissgr has quit [Ping timeout: 244 seconds]
hkaiser has quit [Quit: bye]
K-ballo has quit [Ping timeout: 272 seconds]
daissgr has joined #ste||ar
hkaiser has joined #ste||ar
aserio has quit [Quit: aserio]
daissgr has quit [Ping timeout: 255 seconds]
daissgr has joined #ste||ar
daissgr has quit [Ping timeout: 255 seconds]