Coldblackice has quit [Remote host closed the connection]
Coldblackice has joined #ste||ar
Coldblackice has quit [Remote host closed the connection]
Coldblackice has joined #ste||ar
Coldblackice has quit [Remote host closed the connection]
Coldblackice has joined #ste||ar
Coldblackice has quit [Remote host closed the connection]
Coldblackice has joined #ste||ar
Coldblackice has quit [Remote host closed the connection]
Coldblackice has joined #ste||ar
Coldblackice has quit [Remote host closed the connection]
Coldblackice has joined #ste||ar
Coldblackice has quit [Remote host closed the connection]
Coldblackice has joined #ste||ar
Coldblackice has quit [Remote host closed the connection]
Coldblackice has joined #ste||ar
Coldblackice has quit [Client Quit]
K-ballo has quit [Quit: K-ballo]
nikunj has joined #ste||ar
hkaiser has quit [Quit: bye]
nikunj has quit [Remote host closed the connection]
<mdiers_>
heller: come closer to the problem: had tested with sanitize=leak, but the sanitize adjustments in hpx are only done with sanitize=address. now continue with sanitize=address. now i get a undeclared identifier asan_fake_stack in context_base.hpp:221 should i use lx::x86_linux_context_impl_base::asan_fake_stack instead of asan_fake_stack? or is there something missing?
rori has joined #ste||ar
JClave has joined #ste||ar
JClave has quit [Remote host closed the connection]
JClave has joined #ste||ar
<simbergm>
jbjnr: yt? cdash submissions have been missing for a while and it looks like it started happening after the cdash upgrade
<JClave>
does anyone know of any commercial software projects using HPX?
<simbergm>
do you know if something changed in the format or submission url?
<simbergm>
JClave: I don't think there are any who would at least publicly say so
<JClave>
because of security reasons?
<mdiers_>
we have one in development, but nothing public yet
<simbergm>
JClave: not necessarily, just that there might be commercial projects using HPX but they just haven't told us
<simbergm>
academic projects is something else, there are at least a few
<JClave>
would you mind naming some please?
<jbjnr>
simbergm: I'll take a look at it. It seemed to be working when it was upgraded, but must have stopped with new results
<jbjnr>
mdiers_: anything you can share with us about your application?
<simbergm>
JClave: octotiger is the most prominent one I can think of, hpxMP is a small reimplementation of OpenMP with HPX, flecsi apparently has some sort of HPX backend, here at CSCS we're working on a cholesky decomposition with HPX (not public)
<simbergm>
hopefully others can fill in the gaps
<tarzeau>
simbergm: did it work at all? were you able to test anything?
<simbergm>
tarzeau: no time yet, sorry
<tarzeau>
i like the stellar group logo, who created it?
<heller>
CCT
K-ballo has joined #ste||ar
hkaiser has joined #ste||ar
<mdiers_>
jbjnr: in short: an application for processing seismic data. a small overview will be posted on our website soon.
<jbjnr>
very nice. an HPC related theme then.
<jbjnr>
Do make sure you send an anouncment to the HPX user's list when you write about it, as we'll all be interested in knowing about it
<Yorlik>
hakaiser: Got yesterdays mess cleaned up. It was a typical newbie-doesn't-know-what-he-does thing. Had to clean it up myself. However the input here still helped me, since it changed the way how I was looking at things. Thanks K-ballo and heller too. :)
<Yorlik>
+ zao :)
<mdiers_>
jbjnr: Yes, but it is also HTC related. I will try to think about the user's list.
<zao>
<3
<Yorlik>
<:3 )~~
<hkaiser>
JClave: we work on a fairly large machine-learning project that uses HPX: Phylanx (github)
<hkaiser>
JClave: also, Yorlik here develops a MMO game using it
<JClave>
Thanks! Keen to start contributing soon, good to verify that this project is key in many production ready softwares.
<hkaiser>
JClave: welcome on board, then!
<hkaiser>
JClave: what's your interest?
<jbjnr>
I've got nothing in my calendar for HPX meeting this afternoon, so if anyone has a link to click at the right time, please send it to me (webex or appear.in ?)
<JClave>
anything involving multithreading and synchronisation primitives. Was going to find something that people want done in HPX
<hkaiser>
jbjnr: we'll probably do appear.in
<hkaiser>
JClave: cool
<hkaiser>
JClave: parallel algorithms?
<JClave>
yeah i was looking for some work related to that
<hkaiser>
jaafar: here are a couple of related tickets: #1141, #1338, #2235, #1836, #1668
<hkaiser>
there might be more, just look around
<JClave>
hkaiser: thanks! will have a look and comment on ones i wish to pick up. just managed to run HPX examples successfully on windows today so will spend a bit more time getting comfortable first :)
<hkaiser>
:D
<jbjnr>
hkaiser: ta
JClave has quit [Quit: Going offline, see ya! (www.adiirc.com)]
JClave has joined #ste||ar
hkaiser has quit [Quit: bye]
<diehlpk_work>
jbjnr, Where can I find the libfrabric branch?
<diehlpk_work>
Bryce is asking around at Nvidia to get support for our next attempt
<jbjnr>
we haven't got our paper into SC yet.
<diehlpk_work>
Yes, but how does this relate to the next attempt?
<jbjnr>
crawl, walk, run
hkaiser has joined #ste||ar
<hkaiser>
heller, simbergm, jbjnr: appear.in?
<simbergm>
hkaiser: yep, sec
Karame has joined #ste||ar
rori has quit [Ping timeout: 245 seconds]
<Yorlik>
Any suggestions to what to read on strategies about when and how to use huge / large memory tables to relief contention from the TLB, and especially how to measure if it makes sense in the first place?
JClave has quit [Remote host closed the connection]
<simbergm>
heller: you have time tomorrow or friday?
Yorlik has quit [Read error: Connection reset by peer]
<heller>
simbergm: tomorrow between 9 and 12 would be good
<simbergm>
heller: fine by me
<simbergm>
jbjnr: good for you?
<heller>
What time do you prefer?
<simbergm>
heller: any time is fine
hkaiser has quit [Quit: bye]
hkaiser has joined #ste||ar
<nikunj97>
hkaiser: yt?
<hkaiser>
here
<nikunj97>
I was running Jackson's code
<nikunj97>
and something is fishy
<hkaiser>
k
<hkaiser>
why am I not surprised?
<nikunj97>
xD
<nikunj97>
so the thing is, with 128 tiles, 16000 doubles/tile and 8192 iterations with 128 steps/iteration they report times of around 5s
<nikunj97>
they -> GaTech
<hkaiser>
k
<nikunj97>
and in their description they say it's Jackson's idea
<nikunj97>
the same code that we run won't finish anywhere close to 5s
<nikunj97>
is it coz many shared futures bottlenecking the performance?
<hkaiser>
well, let's see
<hkaiser>
how many future do we create for this?
<nikunj97>
let me check
<nikunj97>
128 shared futures per iteration
<nikunj97>
and 8192 iteration in total
<hkaiser>
and they do that without any futures?
<nikunj97>
well, if they use the same code then they do it using promises and futures
<hkaiser>
that's ~1Mio futures for us, i.e. about 1-2s overhead from them
<hkaiser>
what do you mean by 128 steps/iteration?
<diehlpk_work>
hkaiser, I asked for the compiler matches, because this is an issue for the Fedora packages.
<nikunj97>
so they copy the left and right tiles
<nikunj97>
so that they can do more time steps per iteration
<hkaiser>
and each step requires a future?
Vir has joined #ste||ar
<hkaiser>
diehlpk_work: nod, thought so - can you define the flag?
<hkaiser>
or would that be in the user's responsibility?
<diehlpk_work>
There are two sides of the medal
<nikunj97>
hkaiser: Don't think so
<hkaiser>
nikunj97: do we have a future per timestep or a future per iteration?
<diehlpk_work>
First, if the user will use the fedora package and compile his own code, it is his responsibility
<nikunj97>
future per iteration
<nikunj97>
not time step
<hkaiser>
nikunj97: ok
<diehlpk_work>
Second, if one uses our fedora package on their build system, I do not know
<hkaiser>
how long does one iteration take?
<nikunj97>
I didn't check
<nikunj97>
but 30 min in with the parameters and it was still running
<nikunj97>
it should not take that long
<hkaiser>
diehlpk_work: we can make that check optional to begin with, or limit it to the major version as you suggested
<hkaiser>
nikunj97: so it just hang?
<hkaiser>
does it make progress at all?
<nikunj97>
that's what I think
<nikunj97>
It's surely making progress, but it's taking too long
<hkaiser>
ok
<diehlpk_work>
What about check the major version and if the major version matches, we allow to compile, but have a warning that minor does not match and we recommend to make them match
<diehlpk_work>
if major not matches we throw an error
<nikunj97>
so doing 4096 as subdomain width, 1024 time steps, and 3 subdomains itself is taking 28s to run
<hkaiser>
ok, do you care enough to have a look into this?
<hkaiser>
nikunj97: in release?
<nikunj97>
yes
<hkaiser>
;-)
<hkaiser>
diehlpk_work: ^^
<nikunj97>
hkaiser: everything is explicitly release now xD
<hkaiser>
I'm not sure the code has been looked at from the standpoint of perf at all
<hkaiser>
ahh, it uses sliding semaphore after all
<nikunj97>
it was Jackson's code above which adrian made things right, and I simply added a function to inject errors
<hkaiser>
nikunj97: sure - but now it calculates a checksum and throws an exception without looking at it
<nikunj97>
yes coz I made it like that
<hkaiser>
and it allocates a buffer for each timestep and partition, etc.
<nikunj97>
checksums will always give right results
<nikunj97>
so you'll have to inject errors artificially
<hkaiser>
nikunj97: if you can retrofit Jackson's kernel into stencil1d_4, sure - have a try
<hkaiser>
you could have overwritten some of the calculated values and let the checksum tell you whether to fail
<hkaiser>
(but this is irrelevant for perf)
<nikunj97>
that's what I was going to do, but then adrian said it's fine either ways since it's benchmarking and we need to inject errors
<hkaiser>
nikunj97: sure, changing stencil1d_4 to simply use dataflow_replay with error injection (and the existing kernel) would give use some numbers as well
<nikunj97>
yes
<nikunj97>
and will save me the hassle to analyse and optimize Jackson's code
<nikunj97>
it's much easier for me to add error injections into stencil1d_4
<nikunj97>
hkaiser: just to let you know, the actual example lw_1d_replay throws error for the given parameters
<nikunj97>
I tried running it rn
<nikunj97>
hkaiser: just took a deep look into it. The init function itself does some 128 allocations for 16001 sized vector. Furthermore it is allocating ~25000 times a vector of 48000 elements. No wonder it's taking much longre than usual
<hkaiser>
right
<nikunj97>
hkaiser: should I ask Keita if we can use the existing 1d_stencil that we have?
<nikunj97>
instead of trying to reduce the allocations and optimizing things here and there