#ste||ar on 2019-07-04 — irc logs at irclog.cct.lsu.edu

2019-06-17 20:46 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoD: https://developers.google.com/season-of-docs/

00:43 <Yorlik> Is there an issue using multithreaded code on godbolt.org? I did some testing with atomics which worked nice with clang on windows, but this explodes at runtime: https://godbolt.org/z/xW55Hl

00:50 <K-ballo> yes, there is, it's a known issue

00:50 <K-ballo> https://github.com/mattgodbolt/compiler-explorer/issues/1428

01:23 <Yorlik> So it's not intentional a.k.a sandboxing?

01:25 <Yorlik> I have one question concerning the memory orders I used in that code: memory_order_acq_rel for the cas loop and memory_order_release for the final store. But when testing relaxed consistently worked. Was that just random or is that to be expected on x86?

01:27 <K-ballo> I want to say neither

01:28 <Yorlik> ???

01:28 * Yorlik is puzzled, which is probably normal in atomic land.

01:28 <K-ballo> it wouldn't be random in the sense that if you keep repeating the test it would keep yielding the same results, but that could change as soon as you change the snippet slightly

01:29 <Yorlik> It's a compiler thing?

01:29 <Yorlik> BTW - debug build - I should try in release

01:29 <K-ballo> yes, the memory model exists in the C++ abstract machine

01:29 <Yorlik> and see if relaxed explodes

01:30 <K-ballo> you can't make assumptions based on the translation of one snippet from abstract machine to x86, not for other different snippets, not even for the same snippets in different contexts

01:30 <Yorlik> IC

01:31 <K-ballo> relaxed means non-tearing but unsequenced

01:31 <K-ballo> in the abstract machine that means you never get to see the value change

01:31 <K-ballo> if it's translated to just the right combination of x86 then of course you'd eventually will, but you may get just as well a translation in which it doesn't

01:35 <Yorlik> I understand is such, that the memory ordering is just influencing the code generation - it has no direct counterpart on the machine instructions - is that correct?

01:35 <Yorlik> Like: What the compiler does to my code.

01:37 <Yorlik> I'm trying to understand it, because I am trying to write a disruptor like structure in C++. So - it's time for me to learn atomics and memory barriers correctly and also I need to learn what can I consider correct and what not - I'm trying to spot where exactly the danger zone will be for me.

01:38 <K-ballo> memory ordering influences code generation, including the machine instructions in that code generation... how could it affect codegen but not instructions?

01:39 <Yorlik> I mean it does not influence the core atomic machine instructions

01:39 <Yorlik> ofc the rest

01:39 <K-ballo> it certainly does

01:39 <K-ballo> is this something you could use an existing lockfree library for?

01:40 <Yorlik> NOt really

01:40 <Yorlik> I am not aware of any lock free library implenting a disruptor we like.

01:41 <Yorlik> I might have missed it. We'll surely look more for it. It's also a learning excercise for me.

01:41 <K-ballo> I don't know what a disruptor is, but making a robust atomic anything takes years and years

01:41 <Yorlik> Disruptor is essentially a ringbuffer with sequenced access to data using atomic counters , one for each producer and consumer

01:42 <Yorlik> So writing on the counters is not cntended

01:42 <K-ballo> sounds like the kind of thing lockfree libraries provide

01:42 <Yorlik> Could be - maybe there are disruptor like implementations which call themselves just queues

01:43 <Yorlik> After all to the outside it is very similar - depending on the use case

01:43 <Yorlik> Actually I was astonished they used a new name for it - to me it looked just like good design.

01:44 <Yorlik> But it came from the java world where every preallocated stuff also is a godsend.

01:44 <Yorlik> Because of the non GC in it

01:45 <Yorlik> What I like about the design is, that using it for a pipeline or other sequenced data processing is easy

01:48 <Yorlik> Actually most of our current design thought are about trying as hard as possible to avoid any sharng of data as far as humanly possible

01:57 K-ballo has quit [Quit: K-ballo]

02:35 hkaiser has quit [Quit: bye]

03:59 Yorlik has quit [Ping timeout: 268 seconds]

07:40 daissgr has joined #ste||ar

08:11 daissgr has quit [Ping timeout: 276 seconds]

08:14 rori has joined #ste||ar

09:50 lsl88 has quit [Read error: Connection reset by peer]

09:51 lsl88 has joined #ste||ar

10:44 Yorlik has joined #ste||ar

10:52 Yorlik has quit [Read error: Connection reset by peer]

10:54 <lsl88> hi! I am trying to merge my pull requests :) may you help me? Have never done it before, I see that to merge them I need it to be approved, is that right?

11:10 Yorlik has joined #ste||ar

11:20 david_pfander has quit [Quit: david_pfander]

11:51 <heller> lsl88: you don't merge it, we do ;)

11:51 <heller> Can you point me to yours?

11:53 <lsl88> heller: oh, but I wanted to make a big pull request with all my small pull requests

11:54 <lsl88> how can I do that?

11:58 <heller> Put your commits on a branch and create the PR

12:00 <lsl88> great

12:36 K-ballo has joined #ste||ar

12:49 hkaiser has joined #ste||ar

13:38 eschnett has joined #ste||ar

15:00 eschnett_ has joined #ste||ar

15:01 eschnett has quit [Ping timeout: 245 seconds]

15:01 eschnett_ is now known as eschnett

15:04 eschnett has quit [Client Quit]

15:07 eschnett has joined #ste||ar

15:23 Amy_ has quit [Ping timeout: 259 seconds]

15:23 Amy_ has joined #ste||ar

15:57 eschnett has quit [Quit: eschnett]

16:10 rori has quit [Quit: bye]

16:46 nikunj has joined #ste||ar

16:53 <nikunj> hkaiser: yt?

16:54 <nikunj> I went through the slides and the report. There is no resource that I can see that limits the iterations per time step to a certain number. It just shows how 3 tasks can be combined into 1 using Jackson's technique.

16:59 <hkaiser> nikunj: the number of iterations per timestep is limited by the width of the ghost zones

16:59 <nikunj> yes, so if I create a new partition_data that includes the ghost zones as well, I should be able to get the same results right?

17:00 <nikunj> I just want to confirm that

17:01 <hkaiser> our partitions don't include the ghost zones, it's a matter of how much data you grab from the neighbor

17:01 <hkaiser> but yes, if you get more than one element from the neighbor you'll need to store that as well

17:02 <hkaiser> otoh, don't we hold the reference to the neighbor, so we could extract what we need on demand

17:09 <nikunj> I see

17:09 <nikunj> let me try to do something

17:52 nikunj has quit [Remote host closed the connection]

19:16 Yorlik has quit [Read error: Connection reset by peer]

19:40 Yorlik has joined #ste||ar

19:43 Vir has quit [Ping timeout: 264 seconds]

20:03 Yorlik has quit [Read error: Connection reset by peer]

21:56 Yorlik has joined #ste||ar

22:07 <heller> hkaiser: out of curiosity, is there a reason why you didn't choose the stencil from the tutorials examples as a template to implement the fault tolerant stuff?

22:20 <hkaiser> heller: they have come to us with a modified 1d stencil, so we went with that

22:20 <heller> alright

22:20 <hkaiser> heller: minimizing effort on our end...

22:21 <heller> sure

22:21 <hkaiser> we have already done way more than they asked (and paid) for

22:22 <heller> ;)

22:32 <hkaiser> heller: the nice thing however is, that our implementation has about 5-10 times less overhead than the one done based on habanero-C (they did implement that before)

22:33 <heller> nice!

22:33 <heller> and the overall runtime ;)?

22:33 <hkaiser> much better anyways ;-)

22:34 <hkaiser> 10-20% better

22:34 <hkaiser> local only however

22:34 <heller> very cool