hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoD: https://developers.google.com/season-of-docs/
<Yorlik> Is there an issue using multithreaded code on godbolt.org? I did some testing with atomics which worked nice with clang on windows, but this explodes at runtime: https://godbolt.org/z/xW55Hl
<K-ballo> yes, there is, it's a known issue
<Yorlik> So it's not intentional a.k.a sandboxing?
<Yorlik> I have one question concerning the memory orders I used in that code: memory_order_acq_rel for the cas loop and memory_order_release for the final store. But when testing relaxed consistently worked. Was that just random or is that to be expected on x86?
<K-ballo> I want to say neither
<Yorlik> ???
* Yorlik is puzzled, which is probably normal in atomic land.
<K-ballo> it wouldn't be random in the sense that if you keep repeating the test it would keep yielding the same results, but that could change as soon as you change the snippet slightly
<Yorlik> It's a compiler thing?
<Yorlik> BTW - debug build - I should try in release
<K-ballo> yes, the memory model exists in the C++ abstract machine
<Yorlik> and see if relaxed explodes
<K-ballo> you can't make assumptions based on the translation of one snippet from abstract machine to x86, not for other different snippets, not even for the same snippets in different contexts
<Yorlik> IC
<K-ballo> relaxed means non-tearing but unsequenced
<K-ballo> in the abstract machine that means you never get to see the value change
<K-ballo> if it's translated to just the right combination of x86 then of course you'd eventually will, but you may get just as well a translation in which it doesn't
<Yorlik> I understand is such, that the memory ordering is just influencing the code generation - it has no direct counterpart on the machine instructions - is that correct?
<Yorlik> Like: What the compiler does to my code.
<Yorlik> I'm trying to understand it, because I am trying to write a disruptor like structure in C++. So - it's time for me to learn atomics and memory barriers correctly and also I need to learn what can I consider correct and what not - I'm trying to spot where exactly the danger zone will be for me.
<K-ballo> memory ordering influences code generation, including the machine instructions in that code generation... how could it affect codegen but not instructions?
<Yorlik> I mean it does not influence the core atomic machine instructions
<Yorlik> ofc the rest
<K-ballo> it certainly does
<K-ballo> is this something you could use an existing lockfree library for?
<Yorlik> NOt really
<Yorlik> I am not aware of any lock free library implenting a disruptor we like.
<Yorlik> I might have missed it. We'll surely look more for it. It's also a learning excercise for me.
<K-ballo> I don't know what a disruptor is, but making a robust atomic anything takes years and years
<Yorlik> Disruptor is essentially a ringbuffer with sequenced access to data using atomic counters , one for each producer and consumer
<Yorlik> So writing on the counters is not cntended
<K-ballo> sounds like the kind of thing lockfree libraries provide
<Yorlik> Could be - maybe there are disruptor like implementations which call themselves just queues
<Yorlik> After all to the outside it is very similar - depending on the use case
<Yorlik> Actually I was astonished they used a new name for it - to me it looked just like good design.
<Yorlik> But it came from the java world where every preallocated stuff also is a godsend.
<Yorlik> Because of the non GC in it
<Yorlik> What I like about the design is, that using it for a pipeline or other sequenced data processing is easy
<Yorlik> Actually most of our current design thought are about trying as hard as possible to avoid any sharng of data as far as humanly possible
K-ballo has quit [Quit: K-ballo]
hkaiser has quit [Quit: bye]
Yorlik has quit [Ping timeout: 268 seconds]
daissgr has joined #ste||ar
daissgr has quit [Ping timeout: 276 seconds]
rori has joined #ste||ar
lsl88 has quit [Read error: Connection reset by peer]
lsl88 has joined #ste||ar
Yorlik has joined #ste||ar
Yorlik has quit [Read error: Connection reset by peer]
<lsl88> hi! I am trying to merge my pull requests :) may you help me? Have never done it before, I see that to merge them I need it to be approved, is that right?
Yorlik has joined #ste||ar
david_pfander has quit [Quit: david_pfander]
<heller> lsl88: you don't merge it, we do ;)
<heller> Can you point me to yours?
<lsl88> heller: oh, but I wanted to make a big pull request with all my small pull requests
<lsl88> how can I do that?
<heller> Put your commits on a branch and create the PR
<lsl88> great
K-ballo has joined #ste||ar
hkaiser has joined #ste||ar
eschnett has joined #ste||ar
eschnett_ has joined #ste||ar
eschnett has quit [Ping timeout: 245 seconds]
eschnett_ is now known as eschnett
eschnett has quit [Client Quit]
eschnett has joined #ste||ar
Amy_ has quit [Ping timeout: 259 seconds]
Amy_ has joined #ste||ar
eschnett has quit [Quit: eschnett]
rori has quit [Quit: bye]
nikunj has joined #ste||ar
<nikunj> hkaiser: yt?
<nikunj> I went through the slides and the report. There is no resource that I can see that limits the iterations per time step to a certain number. It just shows how 3 tasks can be combined into 1 using Jackson's technique.
<hkaiser> nikunj: the number of iterations per timestep is limited by the width of the ghost zones
<nikunj> yes, so if I create a new partition_data that includes the ghost zones as well, I should be able to get the same results right?
<nikunj> I just want to confirm that
<hkaiser> our partitions don't include the ghost zones, it's a matter of how much data you grab from the neighbor
<hkaiser> but yes, if you get more than one element from the neighbor you'll need to store that as well
<hkaiser> otoh, don't we hold the reference to the neighbor, so we could extract what we need on demand
<nikunj> I see
<nikunj> let me try to do something
nikunj has quit [Remote host closed the connection]
Yorlik has quit [Read error: Connection reset by peer]
Yorlik has joined #ste||ar
Vir has quit [Ping timeout: 264 seconds]
Yorlik has quit [Read error: Connection reset by peer]
Yorlik has joined #ste||ar
<heller> hkaiser: out of curiosity, is there a reason why you didn't choose the stencil from the tutorials examples as a template to implement the fault tolerant stuff?
<hkaiser> heller: they have come to us with a modified 1d stencil, so we went with that
<heller> alright
<hkaiser> heller: minimizing effort on our end...
<heller> sure
<hkaiser> we have already done way more than they asked (and paid) for
<heller> ;)
<hkaiser> heller: the nice thing however is, that our implementation has about 5-10 times less overhead than the one done based on habanero-C (they did implement that before)
<heller> nice!
<heller> and the overall runtime ;)?
<hkaiser> much better anyways ;-)
<hkaiser> 10-20% better
<hkaiser> local only however
<heller> very cool