#ste||ar on 2020-03-24 — irc logs at irclog.cct.lsu.edu

2020-02-24 20:46 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020

00:59 RostamLog has joined #ste||ar

01:00 RostamLog has joined #ste||ar

01:02 ahkeir1 has quit [Quit: Leaving]

01:05 nikunj97 has quit [Ping timeout: 260 seconds]

01:17 bita has joined #ste||ar

01:45 hkaiser has quit [Quit: bye]

02:00 shahrzad has quit [Ping timeout: 246 seconds]

02:11 SSolei1 has joined #ste||ar

02:21 rtohid has left #ste||ar [#ste||ar]

02:49 SSolei1 has quit [Remote host closed the connection]

02:56 shahrzad has joined #ste||ar

03:44 weilewei has quit [Ping timeout: 240 seconds]

04:02 bita has quit [Quit: Leaving]

04:17 shahrzad has quit [Ping timeout: 246 seconds]

05:44 shahrzad has joined #ste||ar

06:21 kale has joined #ste||ar

06:26 kale has quit [Client Quit]

06:37 shahrzad has quit [Ping timeout: 260 seconds]

07:24 mdiers_ has quit [Quit: mdiers_]

07:26 mdiers_ has joined #ste||ar

07:29 nikunj97 has joined #ste||ar

07:51 <heller1> nikunj97: is PAPI available on those? Or other mean to read the hardware performance counters?

07:51 <nikunj97> heller1, I do not know. Let me check

07:53 <nikunj97> heller1, yes PAPI is available

07:53 <heller1> ok, good

07:53 <heller1> do you have a roofline model already?

07:54 <nikunj97> heller1, I don't have one

07:54 <nikunj97> how do I create a roofline model?

07:54 <heller1> then create one ;)

07:54 <heller1> one sec

07:55 <heller1> nikunj97: https://docs.nersc.gov/programming/performance-debugging-tools/roofline/

07:56 <nikunj97> aah! I'll try that!

07:56 <nikunj97> also I feel that my yesterday's result may be skewed

07:56 <heller1> so first, draw a graph with the roofline

07:57 <heller1> shrug

07:57 <heller1> start with the basics first

07:57 <nikunj97> ok

07:57 <heller1> also, draw the roofline with different peaks

07:58 <nikunj97> ok, let me try it

07:58 <heller1> different max bandwidth (main memory, cache levels), different max compute (no vectorization, vectorization, FMA, single threaded, all threads, etc)

07:59 <heller1> since the stencil's metric is MLUP/S, you should convert GFLOP/S to MLUP/S

08:00 <nikunj97> so for a 5 point stencil, mlups = glops/5 ?

08:12 <heller1> no

08:12 <heller1> you need to calculate the arithmetic intensity

08:12 <heller1> as a first step

08:13 <heller1> well, keep the gflops at the first step

08:13 <heller1> i'll explain the conversion later

08:13 <heller1> setup the roofline for the ARM64FX2 first

08:13 <nikunj97> ok

08:15 <jbjnr> nikunj97: which of the stencil examples are you improving. It sounds like you are ding something very useful and worthwhile.

08:15 <nikunj97> jbjnr, I'm working with heller1's 2d stencil benchmark from one of his lectures

08:16 <jbjnr> is it part of the tutorials?

08:16 <jbjnr> (in the tutorials repo)

08:16 <nikunj97> yes

08:17 <jbjnr> ok great.

08:17 <jbjnr> we have a plan to redo the tutorial material for the next course and it would be lovely to have a simd version of the stncil code to add to the material.

08:18 <nikunj97> jbjnr, I'm trying my best :)

08:20 <heller1> nikunj97: yeah, would be nice if you could write a few pages about the performance modelling ;)

08:20 <heller1> lessons learnt, optimization etc

08:21 <nikunj97> well this is all for a lab based project at my university. As I told you, it's a collaboration between iitr and jsc, so they won't let me off without a good 10-15 page report ;)

08:26 <jbjnr> we have a gsoc project to add simd stuff, couldn't you do that as well and get paid for it too?

08:27 <jbjnr> jsc = julich?

08:28 <nikunj97> I have an inernship this summer. Also, I'm a mentor this gsoc so I won't be able to apply as a student.

08:28 <nikunj97> and yes, jsc is julich supercomputing center

08:28 <jbjnr> k

08:29 <nikunj97> but I can look into the project when I'm free and add stuff there

09:48 nikunj has quit [Read error: Connection reset by peer]

09:49 nikunj has joined #ste||ar

10:06 <nikunj97> how do I get interactive access to a node in rostam. I don't see screen anymore :/

10:16 karame78 has quit [Remote host closed the connection]

10:31 kale has joined #ste||ar

10:40 kale_ has joined #ste||ar

10:43 kale has quit [Ping timeout: 250 seconds]

12:06 hkaiser has joined #ste||ar

12:13 Hashmi has joined #ste||ar

12:20 Abhishek09 has joined #ste||ar

12:24 Abhishek09 has quit [Remote host closed the connection]

12:50 kale_ has quit [Quit: Leaving]

12:50 kale_ has joined #ste||ar

13:10 <nikunj97> heller1, how do I do roofline analysis with PAPI?

13:10 <heller1> you don't

13:11 <nikunj97> I went through the link you sent. It looks like I need to first find theoretical maximum and then add some macros to your code to find gflops in your application. Post this you compare

13:11 <heller1> nikunj97: step #1: determine the peak bandwidth of your system. step #2: determine the peak flops performance of your system

13:12 <heller1> well, for a stencil that's simple enough

13:12 <heller1> anyways, let's get the roofline first

13:12 <nikunj97> yes

13:13 <heller1> once you've done step #1 and step #2, you can plot the roofline as the function `min(AI * peak_bw, peak_flops)` where AI stands for arithmetic intensity

13:14 <heller1> which is the unit on you're x-axis

13:14 <heller1> arithmetic intensity is a metric that tells you how many flops per byte your system can achieve

13:14 <heller1> sorry, misformulated it

13:14 <heller1> the arithmetic intensity is the unit that characterizes your workload

13:15 <nikunj97> why is that on the y-axis then?

13:15 <heller1> FLOPS

13:16 <nikunj97> ohh that's a min there

13:16 <heller1> if you have a bandwitdh bound problem, you have a low arithmetic intensity, you need to transfer more memory to your ALUS

13:16 <nikunj97> How do I transfer more memory to ALU?

13:17 <nikunj97> anyway, let me try step #1 and #2 before I ask any more doubts

13:18 <heller1> for example, the following calculation (assuming all floats): `a = b + c;` requires two loads and one store, an equivalent of 12 bytes, for 1 floating point operation, that means that your arithmetic intensity is 1/12

13:18 <heller1> you don't transfer more memory to your ALU

13:18 <heller1> this is a fixed unit (determined by the peak bandwidth)

13:19 <nikunj97> gotcha

13:19 <heller1> what you need to do is to increase the number of floating point operations per memory load/store

13:20 <heller1> this can be done by exploiting your cache organization, intelligent prefetching, or a different algorithm

13:20 <heller1> so, first complete step #1 and #2

13:22 <heller1> for step #2, you can have different peak lines, as mentioned yesterday. One with scalar only operations, one with vectorization, one with vectorized FMA

13:22 <heller1> and one additional one using all cores and vectorization

13:22 <nikunj97> which module do I need to load to test this?

13:23 <heller1> none?

13:24 <nikunj97> so I run an equivalent of nersc$ srun -n 4 -c 6 sde -knl -d -iform 1 -omix my_mix.out -i -global_region -start_ssc_mark 111:repeat -stop_ssc_mark 222:repeat -- foo.exe?

13:25 <heller1> I have no idea what this command means

13:25 <nikunj97> lol, I think I'm confusing myself and you

13:25 <heller1> to get the peak memory bandwidth, use this: https://www.cs.virginia.edu/stream/FTP/Code/

13:25 <nikunj97> ya that's what I wanted to know

13:26 <heller1> make sure you compile it with openmp enabled

13:28 kale_ is now known as kale

13:29 <diehlpk_mobile[m> In your class Extended Community Bonding period to 4 weeks (from 3 weeks)

13:29 <diehlpk_mobile[m> would

13:29 kale is now known as kale__

13:29 kale__ is now known as kale_

13:30 * diehlpk_mobile[m sent a long message: < https://matrix.org/_matrix/media/r0/download/matrix.org/gIDhjrfBHkiQrpEpvyEPvcUr >

13:30 <diehlpk_mobile[m> <diehlpk_mobile[m "> <@diehlpk:matrix.org> In your "> Google sent an email with updated time line for GSoC

13:30 kale_ has quit [Quit: Leaving]

13:30 kale_ has joined #ste||ar

13:31 <diehlpk_mobile[m> Student proposal review period is now March 31-April 20.

13:31 <diehlpk_mobile[m> And we got one additional week to review the proposals

13:40 <heller1> yay

13:42 <heller1> nikunj97: for the peak flop performance, it is enough to consider the max frequency and the number of operations you can get through per cycle, no need to measure it

13:44 <nikunj97> heller1, I was currently reading on stream benchmarks. You want me to run `gcc -fopenmp -D_OPENMP stream.c -o stream` on the source code, right?

13:44 <heller1> and `-O3`

13:44 <nikunj97> right!

13:46 <heller1> then take the value you got from the stream TRIAD and multiply it by 1.5, then you have a realistic number

13:55 <kale_> diehlpk_mobile[m, I am currently working on my second draft. I have found better ways to implement the pip package and I am actively looking for me. Since you are busy with your work. You can directly read my second draft tomorrow instead of going through both.

13:55 akheir has joined #ste||ar

13:58 Abhishek09 has joined #ste||ar

13:58 ahkeir1 has joined #ste||ar

13:59 <nikunj97> heller1, here are the results of running stream on hisilicon1616 and haswell x86 e5: https://gist.github.com/NK-Nikunj/fb61448647bc4bcc8c7ce0d0833c1036

14:00 <nikunj97> hisilicon seems to be about twice as powerful wrt e5 in triad

14:00 akheir has quit [Read error: Connection reset by peer]

14:00 <heller1> nikunj97: they make sense

14:00 <heller1> any more information on this system?

14:01 <nikunj97> the gist is all I got from running stream

14:01 <nikunj97> is there anything I'm missing?

14:01 <heller1> "The Hi1616 supports up to 512 GiB of quad-channel DDR4-2400 memory. This chip supports up to 2-way SMP with two ports supporting 96 Gb/s each."

14:01 <heller1> so there you go

14:02 <heller1> https://en.wikichip.org/wiki/hisilicon/kunpeng/hi1616

14:02 <nikunj97> aah! it definitely is powerful then

14:02 <heller1> no, looks good

14:02 <nikunj97> should I also run in on a64fx that we have?

14:02 <heller1> sure thing

14:02 <nikunj97> alright!

14:03 <heller1> make a case study ;)

14:03 <nikunj97> Wish I knew enough about processors to make that happen

14:04 <heller1> trust me, what you have so far is plenty

14:05 <nikunj97> heller1, btw I don't see any slurm setup on a64fx

14:05 <nikunj97> would it be fine to simply run the benchmark?

14:06 <heller1> i guess

14:06 <heller1> check if noone else is on the system

14:06 <heller1> and make sure to repeat the measurement a few times

14:07 <nikunj97> alright, so run them a good 10 times

14:07 <nikunj97> and take the average

14:07 <heller1> but in any case, it would be could to be aware about the general procedure of doing benchmarks for such a shared system

14:07 <heller1> or the max

14:08 <nikunj97> this is the first time, I'm doing this 😅. I will definitely remember this in the future!

14:10 weilewei has joined #ste||ar

14:11 karame78 has joined #ste||ar

14:23 Hashmi has quit [Quit: Connection closed for inactivity]

14:33 <nikunj97> heller1, set of 10 runs sure has a lot of variety in values: https://gist.github.com/NK-Nikunj/fb61448647bc4bcc8c7ce0d0833c1036#file-x86-set-of-10-runs

14:33 <nikunj97> is that expected?

14:36 <jbjnr> nikunj97: you might need to watch out for CPU frequency throttling. Recall that modern CPUs/etc can slow themselves down when they get hot. This can mess up benchmarks!

14:37 <nikunj97> jbjnr, aah! that makes sense

14:38 <jbjnr> make sure nobody else is using the node you're on, and make sure you're not running your tests on a login node

14:38 <nikunj97> I'm allocating myself a separate node

14:39 <nikunj97> and running a script that runs the stream executable 10 times

14:39 <nikunj97> and stores the triad result to a file

14:39 <heller1> nikunj97: ahh, the arm64fx is the arm node that I gave you?

14:39 <nikunj97> heller1, yes!

14:40 <nikunj97> it's not related to the project I'm doing with, but I decided to write my benchmark such that we can reuse it on a64fx as well for our own project

14:40 <nikunj97> our -> ste||ar

14:41 <heller1> nikunj97: it is _not_ a arm64fx, the arm64fx is the one with SVE, the one that is going to get put into post-k (aka Fugaku)

14:41 <nikunj97> what is it then?

14:42 <heller1> entirely different machines

14:42 <heller1> yes, but it is NOT a ARM64FX

14:43 <nikunj97> why's the telegram group ARMFX64 then? shrug

14:43 <heller1> I always forget ... you should have the information in the email thread where I gave you access...

14:44 <heller1> the telegram group is about getting access to riken's machines, which use the armfx64

14:44 <nikunj97> "give me a shout if you need access to a larger aarch64 machine"

14:44 <nikunj97> should've read that right

14:45 <nikunj97> diehlpk_mobile[m told me that our proposal was accepted by fujitsu

14:45 <nikunj97> and that we should get access to it

14:45 <heller1> aarch64 is the generic term for arm 64 bit architectures

14:45 <heller1> the Hi1616 is one as well

14:46 <nikunj97> yes. I was mislead by the telegram group name

14:46 <nikunj97> I should've seen /proc/cpuinfo

14:46 ct-clmsn has joined #ste||ar

14:47 <nikunj97> let me remove those stream benchmarks claiming to be arm64fx then

14:47 <heller1> you don't have to remove them, just give them their true name ;)

14:49 <nikunj97> so it's a qualcomm falkor

14:49 <heller1> something like that, yes

14:49 <nikunj97> so now, I have the peak traid bandwidth

14:50 <nikunj97> I multiply it by 1.5 to get a realistic number

14:50 <nikunj97> what is the next step?

14:50 <heller1> and now plot the roofline

14:51 <heller1> or was it 1.5?

14:51 <heller1> let me check the code again ;)

14:51 diehlpk has joined #ste||ar

14:52 <nikunj97> heller1, they don't multiply by 1.5 anywhere

14:53 <nikunj97> they report: avgtime[j] = avgtime[j]/(double)(NTIMES-1);

14:54 <nikunj97> diehlpk, yt?

14:54 <heller1> yeah, that;s fine, leave out the multiplication

14:54 <diehlpk> nikunj97, yes

14:55 <nikunj97> so I report the average triad as bandwidth

14:55 <nikunj97> diehlpk, did we not get our proposal accepted with fujitsu?

14:55 <nikunj97> I thought they accepted our proposal and we were getting access to the a64fx machines

14:56 <diehlpk> I assumed the same but they never came back to us

14:56 <nikunj97> so no a64fx :/

14:57 <diehlpk> I do not know, just sent him one more time a reminder

14:59 <heller1> so you don't have a SVE capable CPU?

15:09 nan has joined #ste||ar

15:10 nan is now known as Guest9057

15:10 Guest9057 has quit [Remote host closed the connection]

15:10 nan1 has joined #ste||ar

15:11 bita has joined #ste||ar

15:15 kale has joined #ste||ar

15:15 kale_ has quit [Read error: Connection reset by peer]

15:20 gonidelis has joined #ste||ar

15:25 shahrzad has joined #ste||ar

15:26 shahrzad has quit [Remote host closed the connection]

15:27 shahrzad has joined #ste||ar

15:29 ahkeir1 has quit [Quit: Leaving]

15:29 ahkeir1 has joined #ste||ar

15:30 ahkeir1 has quit [Client Quit]

15:31 akheir has joined #ste||ar

15:33 gonidelis48 has joined #ste||ar

15:34 gonidelis has quit [Remote host closed the connection]

15:34 gonidelis48 is now known as gonidelis

15:37 <Yorlik> WTF???(T00000000/----------------.----/----------------) P--------/----------------.---- 16:17.53.385 [0000000000000001] <fatal> [ERR] thread_func: default thread_num:2 : caught boost::system::system_error: Unknown error (1455), aborted thread execution

15:37 <Yorlik> <unknown>

15:37 <Yorlik> (T00000000/----------------.----/----------------) P--------/----------------.---- 16:17.53.391 [0000000000000003] <fatal> [ERR] thread_func: default thread_num:0 : caught boost::system::system_error: The paging file is too small for this operation to complete, aborted thread execution

15:37 <Yorlik> <unknown>

15:39 <heller1> he

15:39 <heller1> run with --hpx:attach-debugger=exception

15:40 <heller1> then you'll get to it

15:40 <Yorlik> Thanks! And Hello heller1 ! :)

15:42 parsa has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]

15:42 <diehlpk> simbergm, jbjnr Anyone interested to prepare the GSoD application?

15:43 parsa has joined #ste||ar

15:43 <heller1> Yorlik: and hi ;)

15:46 <hkaiser> Yorlik: looks like a OOM error

15:47 <Yorlik> I had triggered a purge of the lua engines, which is a loop over a vector and deletetion of members

15:47 <Yorlik> I was doing some memory debugging to check allocations and stuff.

15:54 rtohid has joined #ste||ar

15:59 kale has quit [Quit: Leaving]

16:06 Abhishek09 has quit [Remote host closed the connection]

16:11 <heller1> does HPX run on folding@home?

16:12 <heller1> that would be quite a feat ;)

16:14 <gonidelis> hey just made a very small PR #4454 on documentation. Hope it's ok as it's just a little detail but I think it's a crucial one as newcomers who try to read manual https://stellar-group.github.io/hpx/docs/sphinx/latest/html/manual/creating_hpx_projects.html would not be able to catch the example :)

16:16 <gonidelis> any comments accepted

16:16 <nikunj97> gonidelis, pr's related to documentation is very much appreciated. hkaiser would concur ;)

16:17 <hkaiser> absolutey!

16:18 <hkaiser> everybody: today is the 12'th birthday of HPX, btw

16:18 <bita> Happy Birthday HPX

16:18 <heller1> hkaiser: woohoo! Awesome job! Congrats to you!

16:19 <zao> Yay!

16:19 <zao> Let's start a new one from scratch :P

16:19 <hkaiser> zao: way to go!

16:20 <gonidelis> wow! 12 years of development...

16:20 <K-ballo> zao: let's use javascript so it runs everywhere

16:20 <zao> WASM, you say?

16:21 <hkaiser> tells me that we should publish the survey results asap

16:22 <hkaiser> "12'th Birthday - What do People think?"

16:25 akheir1 has joined #ste||ar

16:25 <heller1> indeed

16:26 <heller1> what's still missing?

16:27 stmatengss has joined #ste||ar

16:27 akheir has quit [Ping timeout: 265 seconds]

16:28 <simbergm> woop, happy birthday HPX!

16:29 <simbergm> hkaiser: sounds like a good idea

16:31 <hkaiser> simbergm: I still need to fix the images, didn't have tim eyet :/

16:32 <simbergm> hkaiser: you mean the labels?

16:32 <hkaiser> yah

16:32 <simbergm> doesn't have to be exactly on the birthday ;)

16:37 <simbergm> hkaiser: you need help with the images? I can probably paste something together quite quickly

16:39 <simbergm> it's just that one image, no?

16:39 <heller1> I think so, yes

16:44 <hkaiser> simbergm: I'm just running out of time, so if you could look into the labels I'd appreciate it very much

16:45 <simbergm> hkaiser: yep, I can take care of it

16:45 <hkaiser> simbergm: thanks!

16:52 Hashmi has joined #ste||ar

17:03 Abhishek09 has joined #ste||ar

17:07 <Abhishek09> rtohid : we will install install hpx by dnf or cmake in manylinux docker?

17:10 <gonidelis> simbergm If you would like you could check if the changes are proper. I have completely removed :lines: and replaced them with :start-after: :end-before:

17:28 <gonidelis> are these little patches merged with master directly? or are they gathered and merged in a large newer-verison-like pull request?

17:31 <hkaiser> gonidelis: nothing goes directly to master, ever

17:31 <hkaiser> everything goes through PRs

17:32 <Yorlik> hkaiser: With the need to keep a lua state around for a task in flight I now see how many tasks actually can be "in flight": I'm around 1000 Lua states in the moment which reflects exactly that. At that level in the moment it stabilizes and doesn't grow the pool of Lua States. I more and more have a feeling that we'd need a new kind of scripting language to handle this programming environment. it's neither Lua or

17:32 <Yorlik> everyones fault - to me it looks rather as a new situation which would require that.

17:32 <Yorlik> anyone - not everyones.

17:33 <hkaiser> heh, running out of ideas, do you?

17:33 <Yorlik> Not really

17:33 <Yorlik> As long as it stabilizes it's not really a big problem. You just need the memoy to keep around these Lua States.

17:33 <hkaiser> how much memory does a lua state consume?

17:34 <Yorlik> After creation in the moment ~500kb - 1MB

17:34 <Yorlik> It depends how large your script base it

17:34 <hkaiser> nod, understand

17:34 <Yorlik> I'm fantasizing about a lua vesion tailored for HPX tbh.

17:34 akheir1 has quit [Read error: Connection reset by peer]

17:34 <Yorlik> E.G. with our programming paradigm there is no reason why the static parts couldn't be shared.

17:35 akheir1 has joined #ste||ar

17:35 <Yorlik> It's just crazy to have all the scripts around in 1000ish copies

17:35 <hkaiser> Yorlik: yah

17:35 <Yorlik> I don't know if it is possible for Lua STates to have more shared data

17:35 <hkaiser> it's the variables that need separation

17:35 <Yorlik> Yes

17:35 <Yorlik> We just need one per OS thread

17:36 <hkaiser> that should be possible somehow, talk to the lua guys

17:36 <Yorlik> Otherwise we'd have data sharing

17:36 Abhishek09 has quit [Remote host closed the connection]

17:36 <Yorlik> I'll dig into that

17:36 <hkaiser> one per os thread would assum ethe hpx threads don't move around

17:36 <Yorlik> Yes

17:36 <Yorlik> It's more to avoid false sharing

17:37 <Yorlik> You don't want two threads trying to read the same ram

17:37 <Yorlik> Even if it's const

17:37 <hkaiser> nah, reading is not an issue

17:38 <Yorlik> Then we 'd need only one copy :)

17:38 <Yorlik> I'm thinking about sharing memory pages virtually

17:38 gonidelis has quit [Remote host closed the connection]

17:38 <Yorlik> But that would probably explode because of addresses stored

17:38 <hkaiser> Yorlik: premature optimization again

17:38 <Yorlik> lol

17:38 <Yorlik> NP having 3-4 GB worth of Lua states arouznd.

17:39 <hkaiser> the lua scripting will overshadow everything else anyways

17:39 <Yorlik> Yup

17:40 <Yorlik> In the moment I'm process ~150,000 messages on 10,000 object per second in Lua. Not good enough still, imo.

17:40 gonidelis has joined #ste||ar

17:41 <hkaiser> smaller memory footprint might help there

17:42 <Yorlik> I might be able to optimize stuff later, when we have our systems prototyped.

17:42 <Yorlik> In the moment every message has a vector of variants as argument pack.

17:42 <Yorlik> Thats not exactly efficient

17:43 <Yorlik> Ove time I can make specialized, smaller messages.

17:43 <hkaiser> depends on how large the variant is

17:43 <Yorlik> 40 bytes

17:43 <Yorlik> A message variant over message types is 32 bytes

17:43 <Yorlik> So - adding a stupid int adds 40 bytes ..

17:44 <Yorlik> We have 40 byte bools ! lol

17:44 <Yorlik> The id_types I put into the variant blew it up together with the strings

17:45 <hkaiser> id_types are intrusive_ptr's essentially, so not more than a single pointer

17:46 <Yorlik> I could preompile all strings and use a string hash instead.

17:46 <Yorlik> These messages also go over the wire.

17:46 <Yorlik> So I can't really make the id_types smaller

17:46 <hkaiser> that shouldn't affect things

17:46 <Yorlik> What is the minimum part of an id_type I really need to uniquely identifdy an ovbject?

17:46 <Yorlik> It's cluster wide messaging

17:46 <hkaiser> id_type is 8 bytes, is that too much?

17:46 <Yorlik> In the moment I use the raw hpx::id_type

17:46 <Yorlik> Then it's the strings

17:47 <Yorlik> I think I'll optimize there

17:47 <Yorlik> 32 byte + variant index = 40

17:47 <hkaiser> https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/runtime/naming/id_type.hpp#L118

17:48 * Yorlik blames the strings

17:48 <hkaiser> std::string could be larger because of small object optimizations

17:48 <Yorlik> I think I'l come up with a global string hashing system onside Lua and use that.

17:48 <hkaiser> they usually store 16-64 bytes directly in the object before starting to allocate

17:49 <hkaiser> or use your own string type

17:50 <Yorlik> I have to talk with our scripter. We might just require to use sh("my string thing") with sh being a lua side memoized hasher

17:50 <Yorlik> Lua side memoized hashing is crazily fast

17:50 akheir1 has quit [Read error: Connection reset by peer]

17:51 akheir1 has joined #ste||ar

17:51 <Yorlik> Only a chat message, like between players would be larger, but that couzld be solved differently anyways.

17:51 nan1 has quit [Ping timeout: 240 seconds]

17:52 shahrzad has quit [Ping timeout: 260 seconds]

17:54 <heller1> is there only one lua implementation?

17:55 <Yorlik> There are several dialects and variants arount. We stick with vanilla for various reasons.

17:56 <hkaiser> sensible choice

17:56 <Yorlik> It's a pity we had to outrule LuaJit

17:56 <heller1> yeah

17:56 <heller1> is luajit maybe an option?

17:56 <Yorlik> Nope

17:56 <Yorlik> No support, bit rot, general dead end.

17:57 <Yorlik> If there were an established group supporting and understanding LuaJit it would be different, but they're not yet there.

17:57 <heller1> http://www.eluaproject.net/doc/v0.9/en_arch_ltr.html

17:57 <heller1> ?

17:57 <heller1> I see

17:58 <Yorlik> Reading this link ^^

17:58 <Yorlik> I'm afraid any other LuaVersion would bust the Lua C++ bindings we use

17:59 <Yorlik> We use sol: https://sol2.readthedocs.io/en/latest/index.html

18:00 <heller1> nod

18:01 <Yorlik> I'm pretty sure there's a ton of optimization possible with having this many lua states aropund. I just didn't have time for that yet. Too much to do in other areas and it's not a big problem yet. But it might become one.

18:02 <Yorlik> The good news is, that most likely the current amount I use is an upper limit

18:03 <heller1> does it scale linearly with the number of states?

18:04 <Yorlik> The states do not cause issues except memory usage

18:05 <Yorlik> When an object gets updated I grab a state from the pool and call the update function in Lua giving it the object and the mailbox.

18:05 <Yorlik> The state sticks with the object until the updater exits. Then the meailbox is dumped and the state returned to the pool.

18:09 <Yorlik> Since a task is a batch of objects There is a limit to the amount of tasks in flight. We most likely can live with this memory consumption for quite a while.

18:10 <heller1> are those merely living on the server?

18:11 <Yorlik> The states? They are just local, sure.

18:11 akheir1 has quit [Read error: Connection reset by peer]

18:11 <Yorlik> Since they are essentially const and the same all aroundthe cluster

18:12 akheir1 has joined #ste||ar

18:12 <Yorlik> Object migration is planned for the stage after this milestone, when a local, single node scripted simulation is stable enough.

18:13 nan1 has joined #ste||ar

18:17 <Yorlik> Just measured: killing 989 Lua States gave me 2.0 MB per state memory back.( measured in Debugger )

18:17 <Yorlik> So thats including Garbage and all.

18:17 <bita> I think we need to update https://github.com/STEllAR-GROUP/phylanx/wiki/Build-Instructions . As far as I know blaze_tensor is a dependency which is not mentioned. Also both links to build hpx do not work

18:27 <ct-clmsn> bita, does steve have notes on how his script builds the containers?

18:28 Abhishek09 has joined #ste||ar

18:29 <bita> I don't know about notes. Nan was using https://github.com/STEllAR-GROUP/phylanx/wiki/Phylanx-in-Containers (steve's singularity on Rostam) and I think its phyalnx is not updated (she got iso_component error)

18:30 <bita> nan1^^

18:31 avah has joined #ste||ar

18:43 <Abhishek09> rtohid?

18:43 <ct-clmsn> bita, gotcha

18:43 <weilewei> hkaiser the non-mpi version of HPX module on Summit is installed and well tested. So Summit now has distributed and serial HPX 1.4.1.

18:44 <ct-clmsn> weilewei, nice

18:44 <hkaiser> perfect!

18:44 <weilewei> :) Yea!

18:46 nan1 has quit [Ping timeout: 240 seconds]

18:48 <rtohid> Abhishek09 here!

18:49 <Abhishek09> rtohid: we will prefer to install hpx by cmake or dnf package ? dnf package will require same gcc version to compile

18:54 <ct-clmsn> what's the best technique for running the clang lint tool provided with phylanx?

18:54 <ct-clmsn> i've ancient code that's blocked on my poor code formatting

18:55 <Abhishek09> cmake , boost by binary tar &,pybind,blaze , blaze tensor by cmake, & git, libjemelloc by sudo,

18:55 <Abhishek09> rtohid

18:56 <rtohid> Abhishek09 CMake

18:57 <rtohid> we just need to follow build instructions on Phylanx's Wiki

18:58 diehlpk has quit [Remote host closed the connection]

18:58 diehlpk has joined #ste||ar

18:58 diehlpk has quit [Changing host]

18:59 <Abhishek09> That means dnf is not involved in our project rtohid diehlpk

19:01 diehlpk has quit [Remote host closed the connection]

19:01 diehlpk has joined #ste||ar

19:02 <Abhishek09> rtohid nikunj97 manylinux is supported on travis/Appveyor ci but not circle ci but cibuildwheel does

19:03 stmatengss has left #ste||ar [#ste||ar]

19:03 <rtohid> Abhishek09 not to build Phylanx itself.

19:04 <diehlpk> Abhishek09, The final solution can not involve the dnf package

19:05 <nikunj97> Abhishek09, if you use manylinux, you will have dnf. But it'll be an older version with old libraries. So you can't rely on dnf

19:05 <Abhishek09> rtohid `not to build Phylanx itself` means?

19:05 <nikunj97> if you're going the manylinux route, you will have to build everything from scratch

19:05 <nikunj97> including a gnu gcc compiler

19:05 <Abhishek09> nikunj97 yes centos 8 support dnf but 5 does nt

19:06 <nikunj97> once you have the gnu compiler toolchain, you will have to build dependencies followed by phylanx

19:06 <nikunj97> so in short, dnf is not allowed. The final solution must not have dnf

19:06 <Abhishek09> dnf is not allowed nikunj97 Why?

19:07 <Abhishek09> old gcc?

19:07 <nikunj97> older version of dnf implies older libraries

19:07 akheir1 has quit [Remote host closed the connection]

19:08 <diehlpk> Abhishek09, As I told you before, a would start to use the dnf package of hpx and try to build phylanx and its dependencies within a whl file

19:08 akheir1 has joined #ste||ar

19:08 avah has quit [Remote host closed the connection]

19:08 <diehlpk> Once you have done this a step within the main goal, you can remove the dnf package and take care to compile hpx

19:09 <diehlpk> Fist step could be: use hpx's dnf package, build all dependencies, and phylanx itself

19:09 <diehlpk> use the build package on a fresh docker file to test things

19:10 <Abhishek09> nikunj97 manylinux doesnt support circle ci

19:10 <diehlpk> In the second step, once we have a working solution for this. You will add hpx to the build chain of the pip package

19:10 <Abhishek09> cibuildwheel does nikunj97

19:11 <diehlpk> I believe once you figured out to build and ship the dependencies and phylanx, it will be easy to do the same for hpx

19:11 <nikunj97> it's worth investigating cibuildwheel as well then

19:11 <diehlpk> To summarize, the final package can not use dnf or apt-get at all

19:12 <diehlpk> A first proof of concept could use dnf to install hpx

19:13 <nikunj97> Abhishek09, https://github.com/joerick/cibuildwheel cibuildwheels can build manylinux as well

19:13 akheir1 has quit [Read error: Connection reset by peer]

19:13 <nikunj97> cibuilds is a later step in the project imo

19:14 akheir1 has joined #ste||ar

19:14 <Abhishek09> Yes , i know it uses the same docker as manylinux

19:14 <Abhishek09> nikunj97

19:14 <diehlpk> yes, I agree. I would not over-engineer the solution

19:14 <nikunj97> Abhishek09, you should focus on the first few steps in order to move to the later ones

19:15 <nikunj97> it's always easier to formulate later steps when you have some initial proof of concept

19:15 <diehlpk> Abhishek09, yes, I agree. Just having a pip package compiled in one docker image and copy it to a fresh one and everything works, would be a huge success

19:16 <nikunj97> that's why everyone is suggesting you to first figure out the part of building a pip package. Once you succeed in that, we can explore the ideas of ci integration

19:17 <nikunj97> but that's when you've successfully made a pip package which compiles everything on manylinux and finally creates a wheel using auditwheel (or the likes)

19:17 <Abhishek09> nikunj97: but rtohid said to follow build instructions on Phylanx's Wiki

19:18 <nikunj97> sure

19:18 <nikunj97> I don't see any confusions

19:19 <Yorlik> hkaiser: It's kinda interesting how they creation of Lua States depends a lot on the properties of the workload and the corresponding number of tasks in flight. I added a small busy loop to objuect creation to slow it down a bit and thus allow intermediary created Lua States to finish, so they could be reused.That drastically reduced the number of created LuaStates. I think there's a lot to learn here.

19:20 <Yorlik> Also it seems my artificial tests do not really reflect a realistic workload. I'll have to do a lot of experimenting.

19:21 <Yorlik> It seems there are phases of bursts where new states get created and then it stabilizes again before it comes to a stable situation overall.

19:21 <Yorlik> When all objects were created and message cration was in a steady state, after purging all engines it run with just 4 - one per thread.

19:22 <Abhishek09> diehlpk: Why we use dnf package rather than using make install?

19:23 <Abhishek09> by cmake

19:23 <nikunj97> he's not forcing you to use dnf

19:23 <nikunj97> he's just telling you, if you want to use dnf package initially, you may do so

19:23 <nikunj97> but the final product should not make use of dnf

19:24 <Abhishek09> that means i can apply any option dnf or cmake

19:24 <diehlpk> Abhishek09, I was thinking that getting HPX to work will take a long time.

19:25 <diehlpk> So you could use dnf to install hpx and just compile all dependencies of Phylanx and itself

19:25 <diehlpk> So you could deliver a first package where you use HPX from dnf

19:26 <diehlpk> and compile only phylanx and its dependencies. So we have a working package sooner

19:26 <diehlpk> I think if you figured out how to compile and ship for example pybind11, it will be easier to compile HPX using pip

19:27 <diehlpk> After we have this package, you can remove dnf insta;;

19:27 <diehlpk> and build hpx

19:27 <diehlpk> I believe that doing baby steps is the way to go

19:28 <diehlpk> It is just my opnion and you can propose whatever you want

19:28 <diehlpk> I just want to say doing incremental steps is better

19:29 <diehlpk> However, the final solution should not use dnf in your proposal

19:29 <hkaiser> Yorlik: you might request a lua state only once the thread actually starts running, not at task creation

19:30 <Yorlik> How could I do that ? I request the states immediately before using them.

19:30 <hkaiser> diehlpk: using dnf for binaries is not cross platform and makes the pip almost useless

19:31 nan1 has joined #ste||ar

19:31 <Yorlik> hkaiser: One problem are Callbacks which use a Lua state form a c++ function which is called from Lua. Unfortunately I cannot just pass around the LuaState, since it's in actions which might run remotely.

19:31 <diehlpk> hkaiser, yes, I know and that is exactly what I want as a first step

19:32 <hkaiser> Yorlik:if you associate the lua state with the hpx thread this shouldn't be a problem

19:32 <diehlpk> I think we should start with the most easiest pip packahe which is might useless at all, but show that tings work

19:33 <hkaiser> diehlpk: ok

19:33 <Yorlik> hkaiser: I had that and it exploded, when the task migrated to another thread. The states are bound to the tasks or I get access errors

19:33 <diehlpk> I think if we can use dnf install, build phylanx and its dependencies in one Fedora docker container and generate a pip package

19:33 <hkaiser> Yorlik: hpx threads have a 64 bit value you can use for your purposes

19:33 <diehlpk> Copy this pip package to a fresh Fedora docker container and install it and run any phylanx example

19:33 <hkaiser> hpx::threads::set_state(std::size_t) or something similar (and size_t get_state())

19:34 <diehlpk> This would be a major step

19:34 <Yorlik> Like using a lua state with two tasks at the same time? That might be possible.

19:34 <hkaiser> so you attach you lua state to the hpx thread which will carry it with it

19:34 <diehlpk> next step would be remove dnf install and compile hpx

19:34 <Yorlik> Oh - I see.

19:34 <hkaiser> no, just one task

19:34 <diehlpk> Once we have done this, we could think on how to use tools to make it more usefull

19:35 <hkaiser> htis way the c++ code called from lua will use the same lua state as the surrounding hpx thread

19:35 <Yorlik> In the moment I am using several states per task, one per object update - that could be fixed.

19:35 <diehlpk> hkaiser, I just said it is not a good way to use the build system for offical pip packages from the beginning on

19:35 <hkaiser> diehlpk: ok

19:35 <diehlpk> We should do baby steps in this direction

19:35 <Yorlik> So I'd use the same state for every update run in the batch of the parllel loop

19:35 <hkaiser> agreed

19:37 <Yorlik> hkaiser: Is there something like task local storage?

19:37 <hkaiser> sure, set_state/get_state

19:37 <Yorlik> OK - I'll look that up

19:39 <zao> diehlpk: Instructions unclear, I removed dnf :P

19:39 <hkaiser> Yorlik: it's set_thread_data/get_thread_data, actually: https://github.com/STEllAR-GROUP/hpx/blob/master/libs/threading_base/include/hpx/threading_base/thread_helpers.hpp#L390-L394

19:39 <Yorlik> hkaiser: Thanks!

19:40 <diehlpk> zao, This is just how I would do it

19:40 <hkaiser> Yorlik: to get the id of the running thread you can use get_self_id()

19:40 <Yorlik> ok.

19:41 <hkaiser> Yorlik: https://github.com/STEllAR-GROUP/hpx/blob/master/libs/threading_base/include/hpx/threading_base/thread_data.hpp#L642

19:44 <Yorlik> Thanks !

19:48 <Yorlik> hkaiser: That function only accepts the id_type and the size_t - What I'd need to do is store a unique ptr that get automagically destroyed, when the task finishes - I can't see how I would do that here. I'd have no way to figure out how long the task lives.

19:49 <bita> hkaiser, a quick question: does [this](https://github.com/STEllAR-GROUP/phylanx/blob/add_retiling/tests/unit/plugins/dist_matrixops/retile_2_loc.cpp#L50) make sense to you? should we fetch the data from other localities when we retile?

19:50 <hkaiser> bita: sec

19:50 <hkaiser> Yorlik: it's thread_id_type, not id_type

19:50 <hkaiser> ahh, but good point

19:50 <hkaiser> I need to create an example for this

19:51 <Yorlik> Yes, but however - What would that size_t help me?

19:51 <Yorlik> The data needs to be destroyed at the end of the task.

19:51 <Yorlik> So the deleter of the unique_ptr to the lua state can kick in

19:51 <Yorlik> and give it back to the pool

19:52 <hkaiser> that's what I need to create an example for

19:52 <Yorlik> OK. Thanks a ton !

19:53 <hkaiser> bita: after the retiling you'd need to create a new annotation which needs to have all of the new tiles in it

19:54 gonidelis has quit [Remote host closed the connection]

19:55 <bita> hkaiser, annotate_d does not need to show all of the meta data. I don't get your point :?

19:55 gonidelis has joined #ste||ar

19:56 <hkaiser> annotate_d produces a annotation that has information for all the tiles,

19:56 <gonidelis> hkaiser I was asking about merging these patces in some milestone version or sth or if they just getting merged directly to master...

20:00 <hkaiser> diehlpk: do you plan to join Maxwells defense dryrun now?

20:01 <hkaiser> gonidelis: do you think this is needed?

20:01 <hkaiser> diehlpk_mobile[m: ^^

20:02 <gonidelis> no i dont. just asking how things work in your team ;)

20:06 <nikunj97> heller1, about roofline analysis. Peak performance of a single core will be (2.6)*(16) GFLOPs/s where 2.6GHz is cpu frequency and 16 instructions are executed per cycle

20:06 <nikunj97> and from stream benchmarks we know that 4GB/s is the memory bandwidth

20:08 <nikunj97> the triad operation is a[j] = b[j]+scalar*c[j];

20:09 <nikunj97> so it will require 2 loads i.e. b[j] and c[j] and load and store for a[j]

20:09 <heller1> More like 40, no?

20:09 <nikunj97> was it 40, let me check

20:10 <nikunj97> ohh yeah 39.xxGB/s

20:10 <nikunj97> my bad

20:10 <heller1> 16 is vectorized fma?

20:10 <nikunj97> yes

20:10 <nikunj97> CPUs have 16 instructions per cycle (as E5-2600v3 series CPUs have AVX2.0 and FMA instruction sets that at their theoretical maximum are two times lager than that of E5-2600v1 and E5-2600v2)

20:11 <nikunj97> so peak core performance is about 41.6GLOPs/s

20:12 <nikunj97> and processor's peak performance is 1664GFLOPs/s

20:12 <nikunj97> so simply AVX2, no FMA will be 20.8GFLOPs/s

20:13 <heller1> And where do you get the 16 from?

20:13 <nikunj97> http://www.novatte.com/our-blog/197-how-to-calculate-peak-theoretical-performance-of-a-cpu-based-hpc-system

20:17 <heller1> Ok, as mentioned earlier, it would be good to have all those separate "roofs"

20:17 <nikunj97> yes, I wanted to confirm this with you

20:18 <nikunj97> so these will be horizontal lines on plotted graph

20:18 <heller1> 1. Scalar instructions, single core 2. Vector instructions, no fma, single core 3. Vector FMA instructions, single core 4. The number of 3. multiplied with the number of cores

20:19 <nikunj97> ok, got it. How do I get the intersecting line (memory bandwidth one)?

20:20 <nikunj97> how do I get to know where it'll intersect (y or x axis)?

20:20 <heller1> So draw three graphs with those different rooflines. One graph per machine to test

20:22 <nikunj97> but what about the memory bandwidth line?

20:22 Hashmi has quit [Quit: Connection closed for inactivity]

20:24 <heller1> Yes, sounds reasonable

20:25 <heller1> Yes, try to plot a proper roofline

20:26 <heller1> Each of the horizontal lines will hint you at the benefit of each architectural improvement

20:27 <heller1> What do you think?

20:27 <heller1> f(x) = min(x*peak_bw, peak_flops), where x is the arithmetic intensity

20:27 <nikunj97> yes, I think I've also figured out the memory bandwidth line. I take 2 operations from stream benchmarks with their arithmetic intensity

20:27 <nikunj97> and then I have their memory bandwidth as well

20:28 <nikunj97> so I multiply it with the corresponding memory bandwidth and compare with processor's peak

20:28 <heller1> No

20:29 <heller1> sorry, riot seems to be completely borked right now

20:30 <nikunj97> is it not the same as your function?

20:31 <nikunj97> I think it is. If I know 2 corresponding points of a line, I can plot the line itself. I have both arithmetic intensity and it's corresponding peak bw

20:32 <heller1> Use pen and paper and try to figure it out yourself

20:33 <nikunj97> ok, let me try

20:33 <heller1> ;)

20:35 <heller1> x is the unknown

20:35 <heller1> You have two functions that intersect, f(x) = a*x and g(x)= b

20:35 <hkaiser> gonidelis: we do a release every couple of month that contains all changes that have accumulated since the last release

20:36 <nikunj97> this would mean that the memory bandwidth line passes through the origin

20:40 <heller1> Basic math rules apply

20:41 <heller1> It absolutely does

20:42 <hkaiser> nikunj97: zero cores use zero memory bandwidth ;-)

20:42 <heller1> Why shouldn't it?

20:42 <nikunj97> hold on, x axis has arithmetic intensity, right?

20:44 <heller1> Yes

20:44 Abhishek09 has quit [Remote host closed the connection]

20:45 <heller1> Why shouldn't it?

20:46 <nikunj97> nothing, I was confused by your analogy. I got what you're saying now

20:50 <heller1> always remember, we use math as a way to express our models ;)

20:50 <heller1> HPX is not magic, it's C++, roofline is not witchcraft, it's math ;)

20:52 <heller1> anyways, zero could either mean that you have no flops or that your bytes got to infinity, both lead to the number of flops/s to be zero

20:52 weilewei has quit [Ping timeout: 240 seconds]

20:52 <nikunj97> yes

20:53 <heller1> so there's no reason why it should not start at zero

20:53 <nikunj97> heller1, we use stream triad benchmark

20:53 <nikunj97> we use it's 2FLOP/iter and does 24B/iter

20:53 <nikunj97> that gives us arithmetic intensity of 1/12 FLOP/B

20:53 <heller1> we used that to determine the maximum bandwidth

20:54 <heller1> but sure, you can use those values to validate your graph

20:54 <heller1> bonus point: where will the triad benchmark result sit at?

20:55 <nikunj97> at the intersection

20:55 <nikunj97> ?

20:56 <heller1> actually, it is 3 FLOPS

20:56 <nikunj97> http://www.cs.virginia.edu/stream/ref.html#counting they say 2 here

20:56 <heller1> well, try and see

20:56 <heller1> I am signing out now, have fun

20:57 <nikunj97> alright! I'll try to figure everything out in the meantime

20:57 <heller1> how about you show me your graphs tomorrow ;)?

20:57 <nikunj97> sure :D

20:58 <heller1> damn, yes, it is two

20:58 <heller1> my bad

20:59 gonidelis has quit [Ping timeout: 240 seconds]

21:13 <hkaiser> Yorlik: yt?

21:13 <Yorlik> Ya

21:14 ct-clmsn has quit [Quit: Leaving]

21:15 <Yorlik> hkaiser: You extended /examples ? :)

21:19 <hkaiser> Yorlik: whatever feature you ask for, that requires me to fix something ;-)

21:19 <Yorlik> lol

21:20 <Yorlik> I am actually using your stuff - obviously in an unusual - non-scientific computing way. After all I'm just making a lousy gameserver. :)

21:25 <hkaiser> Yorlik: nah, this feature was broken at some point during our refactorings

21:25 <Yorlik> I'm happy to help :)

21:35 * Yorlik starts liking variant: https://godbolt.org/z/n55XYn

21:36 <hkaiser> Yorlik: https://github.com/STEllAR-GROUP/hpx/pull/4457

21:36 <hkaiser> here is the example: https://github.com/STEllAR-GROUP/hpx/pull/4457/files#diff-d2c227004701f5547bbfb22d926c28b2R28-R34

21:36 <hkaiser> (I turned it into a test as I needed it anyways)

21:36 <Yorlik> Now we need this + variant + filesystem in a release :)

21:37 <hkaiser> Yorlik: this will be in the next release

21:37 <Yorlik> Awesome !

21:37 <hkaiser> also, I'd expect you discover more of those...

21:37 <Yorlik> This is really fun. I mess up stuff, you fix it and I can use it. :)

21:38 <Yorlik> Fells like beingf a child again

21:38 <hkaiser> I'm not a candy shop however ;-)

21:38 <Yorlik> lol, no.

21:39 <Yorlik> Oh I see the test.

21:40 <Yorlik> Now one question - how would I set the task exit callback just once without testing it every time in my parloop?

21:40 <Yorlik> hkaiser: ^^

21:40 <Yorlik> Since these tasks get created automagically.

21:40 <hkaiser> right

21:40 <hkaiser> nice one

21:41 <Yorlik> Maybe parloop needs an extension.

21:41 <hkaiser> is that for_loop(par, ...)?

21:41 <Yorlik> Yes

21:41 <hkaiser> sec

21:41 <Yorlik> Don't say you can already do that?

21:43 <hkaiser> Yorlik: create an executor that handles the lua state

21:44 <Yorlik> Next release is 1.5?

21:44 <Yorlik> I'll work on that, once variant + this fix is together in a release

21:45 <Yorlik> Just reduced my vardata from 40 to 16 bytes by ditching strings

21:46 <hkaiser> well, I think we could have a special executor that allows to pass in a start and a exit function for each created thread

21:46 <hkaiser> you could use that one for your needs

21:46 <Yorlik> That would be incredibly useful

21:47 <hkaiser> yorliks_executor exec([](){ "start"; }, []() { "stop"; }); for_loop(exec, ...);

21:48 <Yorlik> I still have to learn about executors, but I guess it's not witchcraft. :)

21:48 <hkaiser> Yorlik: also since foor_loop runs several iterations on a single HPX thread, that would reduce the number of lua states

21:49 <Yorlik> Absolutely

21:49 <Yorlik> I wouzld just request the task local lua state instead a new one from the pool.

21:49 <hkaiser> I can try sketching that as an example ;-)

21:49 nk__ has joined #ste||ar

21:49 <hkaiser> let's see what I will uncover this time

21:49 <Yorlik> :)

21:50 <Yorlik> Since I started to work on the Lua scripting Api a whole new bunch of difficulties came up. Killed 2 races yesterday.

21:51 <Yorlik> It feels like a minefield and I have to move very carefully and slowly.

21:52 nikunj97 has quit [Ping timeout: 260 seconds]

21:52 diehlpk has quit [Remote host closed the connection]

21:52 diehlpk has joined #ste||ar

21:52 diehlpk has quit [Changing host]

21:52 diehlpk has joined #ste||ar

21:57 diehlpk has quit [Ping timeout: 246 seconds]

21:58 nikunj97 has joined #ste||ar

21:59 nk__ has quit [Ping timeout: 240 seconds]

22:00 diehlpk has joined #ste||ar

22:06 nk__ has joined #ste||ar

22:08 nikunj97 has quit [Ping timeout: 246 seconds]

22:21 diehlpk has quit [Ping timeout: 240 seconds]

22:30 karame78 has quit [Remote host closed the connection]

22:35 nan1 has quit [Ping timeout: 240 seconds]

22:42 diehlpk has joined #ste||ar

22:48 bita has quit [Ping timeout: 246 seconds]

22:57 diehlpk has quit [Ping timeout: 260 seconds]

23:07 shahrzad has joined #ste||ar

23:33 rtohid has left #ste||ar [#ste||ar]

23:49 shahrzad has quit [Ping timeout: 246 seconds]