<Yorlik>
Is there an issue using multithreaded code on godbolt.org? I did some testing with atomics which worked nice with clang on windows, but this explodes at runtime: https://godbolt.org/z/xW55Hl
<Yorlik>
So it's not intentional a.k.a sandboxing?
<Yorlik>
I have one question concerning the memory orders I used in that code: memory_order_acq_rel for the cas loop and memory_order_release for the final store. But when testing relaxed consistently worked. Was that just random or is that to be expected on x86?
<K-ballo>
I want to say neither
<Yorlik>
???
* Yorlik
is puzzled, which is probably normal in atomic land.
<K-ballo>
it wouldn't be random in the sense that if you keep repeating the test it would keep yielding the same results, but that could change as soon as you change the snippet slightly
<Yorlik>
It's a compiler thing?
<Yorlik>
BTW - debug build - I should try in release
<K-ballo>
yes, the memory model exists in the C++ abstract machine
<Yorlik>
and see if relaxed explodes
<K-ballo>
you can't make assumptions based on the translation of one snippet from abstract machine to x86, not for other different snippets, not even for the same snippets in different contexts
<Yorlik>
IC
<K-ballo>
relaxed means non-tearing but unsequenced
<K-ballo>
in the abstract machine that means you never get to see the value change
<K-ballo>
if it's translated to just the right combination of x86 then of course you'd eventually will, but you may get just as well a translation in which it doesn't
<Yorlik>
I understand is such, that the memory ordering is just influencing the code generation - it has no direct counterpart on the machine instructions - is that correct?
<Yorlik>
Like: What the compiler does to my code.
<Yorlik>
I'm trying to understand it, because I am trying to write a disruptor like structure in C++. So - it's time for me to learn atomics and memory barriers correctly and also I need to learn what can I consider correct and what not - I'm trying to spot where exactly the danger zone will be for me.
<K-ballo>
memory ordering influences code generation, including the machine instructions in that code generation... how could it affect codegen but not instructions?
<Yorlik>
I mean it does not influence the core atomic machine instructions
<Yorlik>
ofc the rest
<K-ballo>
it certainly does
<K-ballo>
is this something you could use an existing lockfree library for?
<Yorlik>
NOt really
<Yorlik>
I am not aware of any lock free library implenting a disruptor we like.
<Yorlik>
I might have missed it. We'll surely look more for it. It's also a learning excercise for me.
<K-ballo>
I don't know what a disruptor is, but making a robust atomic anything takes years and years
<Yorlik>
Disruptor is essentially a ringbuffer with sequenced access to data using atomic counters , one for each producer and consumer
<Yorlik>
So writing on the counters is not cntended
<K-ballo>
sounds like the kind of thing lockfree libraries provide
<Yorlik>
Could be - maybe there are disruptor like implementations which call themselves just queues
<Yorlik>
After all to the outside it is very similar - depending on the use case
<Yorlik>
Actually I was astonished they used a new name for it - to me it looked just like good design.
<Yorlik>
But it came from the java world where every preallocated stuff also is a godsend.
<Yorlik>
Because of the non GC in it
<Yorlik>
What I like about the design is, that using it for a pipeline or other sequenced data processing is easy
<Yorlik>
Actually most of our current design thought are about trying as hard as possible to avoid any sharng of data as far as humanly possible
K-ballo has quit [Quit: K-ballo]
hkaiser has quit [Quit: bye]
Yorlik has quit [Ping timeout: 268 seconds]
daissgr has joined #ste||ar
daissgr has quit [Ping timeout: 276 seconds]
rori has joined #ste||ar
lsl88 has quit [Read error: Connection reset by peer]
lsl88 has joined #ste||ar
Yorlik has joined #ste||ar
Yorlik has quit [Read error: Connection reset by peer]
<lsl88>
hi! I am trying to merge my pull requests :) may you help me? Have never done it before, I see that to merge them I need it to be approved, is that right?
Yorlik has joined #ste||ar
david_pfander has quit [Quit: david_pfander]
<heller>
lsl88: you don't merge it, we do ;)
<heller>
Can you point me to yours?
<lsl88>
heller: oh, but I wanted to make a big pull request with all my small pull requests
<lsl88>
how can I do that?
<heller>
Put your commits on a branch and create the PR
<lsl88>
great
K-ballo has joined #ste||ar
hkaiser has joined #ste||ar
eschnett has joined #ste||ar
eschnett_ has joined #ste||ar
eschnett has quit [Ping timeout: 245 seconds]
eschnett_ is now known as eschnett
eschnett has quit [Client Quit]
eschnett has joined #ste||ar
Amy_ has quit [Ping timeout: 259 seconds]
Amy_ has joined #ste||ar
eschnett has quit [Quit: eschnett]
rori has quit [Quit: bye]
nikunj has joined #ste||ar
<nikunj>
hkaiser: yt?
<nikunj>
I went through the slides and the report. There is no resource that I can see that limits the iterations per time step to a certain number. It just shows how 3 tasks can be combined into 1 using Jackson's technique.
<hkaiser>
nikunj: the number of iterations per timestep is limited by the width of the ghost zones
<nikunj>
yes, so if I create a new partition_data that includes the ghost zones as well, I should be able to get the same results right?
<nikunj>
I just want to confirm that
<hkaiser>
our partitions don't include the ghost zones, it's a matter of how much data you grab from the neighbor
<hkaiser>
but yes, if you get more than one element from the neighbor you'll need to store that as well
<hkaiser>
otoh, don't we hold the reference to the neighbor, so we could extract what we need on demand
<nikunj>
I see
<nikunj>
let me try to do something
nikunj has quit [Remote host closed the connection]
Yorlik has quit [Read error: Connection reset by peer]
Yorlik has joined #ste||ar
Vir has quit [Ping timeout: 264 seconds]
Yorlik has quit [Read error: Connection reset by peer]
Yorlik has joined #ste||ar
<heller>
hkaiser: out of curiosity, is there a reason why you didn't choose the stencil from the tutorials examples as a template to implement the fault tolerant stuff?
<hkaiser>
heller: they have come to us with a modified 1d stencil, so we went with that
<heller>
alright
<hkaiser>
heller: minimizing effort on our end...
<heller>
sure
<hkaiser>
we have already done way more than they asked (and paid) for
<heller>
;)
<hkaiser>
heller: the nice thing however is, that our implementation has about 5-10 times less overhead than the one done based on habanero-C (they did implement that before)