#ste||ar on 2020-03-25 — irc logs at irclog.cct.lsu.edu

2020-02-24 20:46 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020

00:05 shahrzad has joined #ste||ar

00:11 diehlpk has joined #ste||ar

00:21 shahrzad has quit [Ping timeout: 246 seconds]

00:23 <hkaiser> Yorlik: see https://github.com/STEllAR-GROUP/hpx/pull/4458

00:24 <Yorlik> KK

00:24 <hkaiser> worked without problems this time

00:24 <hkaiser> just adding an example

00:24 <Yorlik> Does this work with your fix only?

00:24 <hkaiser> should do what you need

00:24 <Yorlik> After the fix?

00:24 <hkaiser> no, doesn't need my fix

00:24 <Yorlik> Cool - I'll check it out

00:24 <Yorlik> Thanks a ton !

00:25 <hkaiser> you can most likely just copy it over to your code base

00:25 <Yorlik> I'm on it right now

00:26 <Yorlik> Reading and trying to undrstand first :)

00:26 <Yorlik> Do I need to put it into the hpx namespace like you did?

00:27 <hkaiser> teh traits have to be in the hpx namespace, the executor doesn't matter

00:28 <Yorlik> Allright.

00:34 <Yorlik> Compiled and run - now trying to integrate it into my code an LuaSTate Handling

00:34 <hkaiser> Yorlik: does it make sense to you?

00:34 <Yorlik> Totally.

00:34 <Yorlik> I don't yet understand everything but I think I grasped the rough idea of it.

00:35 <Yorlik> You basically provided exactly what I need for my parloop.

00:35 <hkaiser> that was the idea

00:35 <Yorlik> And now I can really do a lot of cool things with my tasks.

00:36 <Yorlik> It's awesome - really - many many thanks!

00:36 <hkaiser> most welcome - its fun to do useful things ;-)

00:36 <diehlpk> First conference was postponed to next year for me

00:37 <Yorlik> hkaiser: It's totally useful.

00:37 <hkaiser> that's more like what I would expect

00:37 <diehlpk> hkaiser, Our course got CxC approved

00:37 <hkaiser> \o/

00:37 <hkaiser> great, btw Bijay never even responded to my email

00:37 <hkaiser> just forget about CS

00:37 <diehlpk> Same for the EE guy

00:38 <hkaiser> idiots

00:38 <diehlpk> At least it will be a math course

00:38 <hkaiser> nod, and we will advertise it at CS/EE and rub it in afterwards ;-)

00:39 <diehlpk> yes, they advertised GSoC for us

00:39 <diehlpk> They sent an email to all gradstudents

00:39 <hkaiser> well, at least something

00:40 <diehlpk> They will do the same for the course

00:40 <hkaiser> diehlpk: btw, Katie asked whether we would be ok if one of her own students applied

00:40 <hkaiser> I meant Kate Isaacs

00:40 <diehlpk> Sure, saw the email but got lost during the moving chaos

00:41 <diehlpk> he just should start to prepare the proposal soon.

00:41 <diehlpk> Deadline is end of this month

00:41 <hkaiser> diehlpk: I got the 2 pager from Geoff, btw - did you get that one too (last week already)

00:41 <diehlpk> No

00:41 <hkaiser> ok, let me forwrd it to you

00:41 <diehlpk> Thx

00:45 <diehlpk> hkaiser, I do not like it

00:45 <diehlpk> Still not clear what we want to do

00:46 <diehlpk> At least I do not understand what we will do in the next two years

00:46 <diehlpk> What is the scenario we will target?

00:46 <diehlpk> What kind of physic is needed to do this?

00:47 <diehlpk> Can octotiger run the tragted application

00:47 <diehlpk> Do we need to implement new physics

00:49 <diehlpk> What about v1309?

00:49 bita has joined #ste||ar

00:50 <hkaiser> exatly my questions

00:51 <Yorlik> hkaiser: Where would I add my custom chunk size in this particular parloop? In the executor?

00:52 <hkaiser> par.on(exec).with(chunk_size_policy_object)

00:52 <Yorlik> Allright. Thanks !

00:54 <Yorlik> hkaiser: hpx::parallel::execution::task is redundant and omitted, i guess?

00:54 <Yorlik> Like being the default executor?

00:56 shahrzad has joined #ste||ar

00:56 <hkaiser> no

00:56 <hkaiser> is you need async execution you write par(task).ont(...).with(...)

00:56 <hkaiser> these things are orthogonal

00:57 <Yorlik> IC. I'll add it

00:57 <hkaiser> Yorlik: par (and par(task)) are execution policies

00:58 <hkaiser> those have an associated executor and associated parameters (like chunk size)

00:58 <Yorlik> I still do not really understand the concepts of an execution policy and an executor

00:58 <Yorlik> You really need to write the hpx book :)

00:58 <hkaiser> .on() changes the associated executor while .with() changes the execution parameters

00:59 <Yorlik> I still don't understand the meaning of these concepts - I just have a foggy idea of what they do.

00:59 <hkaiser> both .on() and .with() return a new execution policy the algorithms can work with

00:59 <Yorlik> I need to do some serious studying / reading on this.

00:59 <hkaiser> Yorlik: an executor is an object that -- well -- executes tasks

00:59 <hkaiser> nothing else

00:59 <hkaiser> it knows how to execute things

01:00 <Yorlik> BTW it runs - I can now work on my empty lambdas and fill them woith meaning (lua state)

01:00 <hkaiser> execution parameters allow to customize the behavior and parameters of scheduling tasks

01:00 <hkaiser> (chunk size, etc.)

01:00 diehlpk has quit [Ping timeout: 260 seconds]

01:01 <Yorlik> Is the executor basically a wrapper for the task main function or sth?

01:01 <hkaiser> execution policy is what the standard defines for the algorithms (par, seq, un_seq, etc.) and are tag types that tell the algorithms how they behave

01:01 <hkaiser> executors are wrappers for the scheduler

01:02 <Yorlik> So - it's more a higher level management obvject?

01:03 <hkaiser> yah, very thin object referring to a scheduling system

01:03 <Yorlik> And - obviously - a customization point.

01:03 <hkaiser> and you can do other things like wrapping the scheduled functions and calling start/stop functions

01:03 <hkaiser> yes

01:04 <Yorlik> So the execution parameters .with etc are inside that thing and working with the scheduler wrapped by the excutor

01:04 <hkaiser> yes

01:04 <Yorlik> OK

01:04 <Yorlik> Less fog now

01:04 <Yorlik> Thanks !

01:04 bita has quit [Quit: Leaving]

01:18 <Yorlik> hkaiser: I think there is a problem because I put everything inside my static member function "update_entity_array_advanced". It compiles, but I get a runtime error: "this->exec_.on_start_.vptr was nullptr.

01:18 <Yorlik> " from basic_function.hpp. How should I fix this?

01:18 <hkaiser> no idea

01:18 <Yorlik> The lambdas and the executor are all defined inside the function

01:18 <hkaiser> the function was not initialized r went out of scope

01:19 <hkaiser> you have to keep the executor alive as long as there are threads running

01:19 <hkaiser> you cn try reomving the reference from the base executor member in the executor

01:20 <hkaiser> BaseExecutor& exec_; --> BaseExecutor exec_;

01:21 nk__ has quit [Ping timeout: 246 seconds]

01:22 <Yorlik> Or I make the executor a member of the class

01:22 <Yorlik> It still crashes. I'll keep it alive instead

01:30 shahrzad has quit [Ping timeout: 246 seconds]

01:37 shahrzad has joined #ste||ar

01:39 akheir1 has quit [Quit: Leaving]

01:50 hkaiser has quit [Quit: bye]

01:58 diehlpk has joined #ste||ar

01:58 diehlpk has quit [Changing host]

01:58 diehlpk has joined #ste||ar

02:45 diehlpk has quit [Ping timeout: 246 seconds]

03:12 shahrzad has quit [Remote host closed the connection]

03:15 shahrzad has joined #ste||ar

04:12 shahrzad has quit [Ping timeout: 260 seconds]

05:12 Abhishek09 has joined #ste||ar

05:13 Abhishek09 has quit [Remote host closed the connection]

05:52 Abhishek09 has joined #ste||ar

05:54 shahrzad has joined #ste||ar

05:55 shahrzad has quit [Client Quit]

06:11 Abhishek09 has quit [Remote host closed the connection]

06:25 Abhishek09 has joined #ste||ar

07:14 mdiers_ has quit [Remote host closed the connection]

07:14 mdiers_ has joined #ste||ar

07:55 nk__ has joined #ste||ar

07:56 <nk__> heller1, here you go: https://imgur.com/a/4fVk8fr ;)

07:58 <heller1> Nice!

07:58 <heller1> Now, explain the graph, highlight the different regions

07:58 <nk__> yesterday btw my main doubt was the use of peak bandwidth. And about passing through origin, most of the plots online didn't pass through origin so that confused me. Didn't realize they were log scaled :D

07:59 <nk__> peak bandwidth was supposed to be the stream triad benchmark that we took coz DRAM peak bandwidth is usually not achievable

08:00 <nk__> btw I couldn't find IPC for HiSilicon1616 anywhere

08:01 <nk__> heller1, explain as in explain you or color the regions for memory and compute bound?

08:03 nk__ is now known as nikunj97

08:08 <heller1> Well, let's assume I have an algorithm with an arithmetic intensity of 8. What can you tell me about it?

08:08 <nikunj97> i can say that it's a memory bound problem and you cannot achieve the peak cpu performance

08:09 <nikunj97> you can achieve I*Peak bw worth of performance at best

08:09 <heller1> I run my program and I get 8 gflops

08:10 <heller1> What now?

08:11 <nikunj97> sounds like you need better cache management

08:12 <heller1> If I introduce better cache management, what will change?

08:12 <nikunj97> your program's memory bandwidth will increase

08:12 <nikunj97> while the arithmetic intensity will be same

08:13 <nikunj97> so an increase in memory bandwidth will lead to a better performance

08:14 <heller1> Will I fetch more memory from dram or less?

08:14 <nikunj97> you will fetch less memory

08:15 <nikunj97> if you have it available in cache the bandwidth will be significantly higher

08:15 <nikunj97> since fetching it from DRAM takes way longer than fetching from cache

08:16 <heller1> Correct

08:16 <nikunj97> so if your code makes use of L1/L2/L3 cache better

08:16 <nikunj97> then you will fetch very little from DRAM

08:16 <heller1> And why does this not affect the arithmetic intensity?

08:17 <nikunj97> because you still do the same number of loads and stores and arithmetic operations

08:17 <heller1> You just said I'm fetching less memory per floo

08:17 <nikunj97> it's just that the loads and stores now are significantly faster

08:17 <heller1> Floo

08:17 <heller1> Flop

08:17 <heller1> Sure

08:18 <heller1> Which line in the graph will affect this?

08:18 <nikunj97> arithmetic intensity is number of floating point operations divided by the byte fetched

08:18 <nikunj97> number of floating point operations for a given equation is a constant

08:19 <nikunj97> and so are bytes fetched

08:19 <nikunj97> so arithmetic intensity won't change

08:19 <nikunj97> it can only be changed by utilizing a different "type"

08:19 <nikunj97> i.e. changing from float to double

08:20 <nikunj97> or employing simd

08:20 <nikunj97> because that changes bytes fetched

08:21 <nikunj97> it will affect the peak bandwidth line

08:23 <nikunj97> heller1, ohh wait

08:23 <nikunj97> it's flop to DRAM byte ration in arithmetic intensity

08:23 <nikunj97> of course it will change if we employ better cache practices

08:24 <heller1> ;)

08:25 <nikunj97> it will move to the right since we will now have less DRAM fetches

08:25 <heller1> tada

08:25 <nikunj97> and peak performance will increase itself

08:26 <heller1> so, having an arithmetic intensity of 8, and measured performance of 8 GFLOPS/s, what will you recommend me to improve in my code?

08:28 <nikunj97> since you have 8GFLOPS/s, you are above scalar peak. Which means you're using simdized kernel. I will recommend you to change the structure of your for loops such that it makes sure of less conflict misses

08:28 <heller1> (NB those are the questions I used to ask my students in the exams for 'Architecture of Supercomputers')

08:28 <heller1> not quite

08:28 <heller1> I am slightly above scalar peak, that's correct

08:30 <heller1> however, there's more instruction level parallelism that could could get me above that performance level. What you can observe, is that there's a gap between my performance results and what could have been achieved when utilizing vectorized instructions

08:30 <nikunj97> yes

08:31 <nikunj97> you can certainly achieve better simd results

08:31 <heller1> so I should look into vectorizing my code

08:32 <heller1> having better cache utilization won't affect my performance since I am not really close to any bandwidth limitation

08:32 <nikunj97> that sounds right too. A better vectorized code will certainly achieve better performance

08:32 <nikunj97> until I hit the memory bandwidth i.e.

08:32 <nikunj97> post which I should look into caching

08:33 <heller1> tada, now you got it ;)

08:33 <nikunj97> yes, I realize that now

08:33 <nikunj97> so now for our jacobi stencil

08:33 <heller1> keep in mind that this is still a very simplistic model, however, it is very effective

08:34 <heller1> you should always start with single core performance

08:34 <nikunj97> yes it looks very effective

08:34 <heller1> and try to get it to the maximum performance for your calculated arithmetic intensity

08:34 <heller1> then go multi threaded and see how well it scales

08:35 <nikunj97> for stencil, we have: next[x][y] = const * (curr[x][y+1] + curr[x][y-1] + curr[x-1][y] + curr[x+1][y])

08:35 <heller1> always use tools to analyze the given performance bottlenecks you identified using the roofline model

08:35 <nikunj97> for a normal cache implementation, we will require 4 loads and 1 store

08:36 <nikunj97> i.e. loading top and bottom neighbor and index of next stencil

08:36 <heller1> you can assume that the index is inside a register

08:37 <heller1> so I'd say 3 loads in the limit

08:37 <nikunj97> which index?

08:37 <heller1> x and y

08:38 <heller1> which index were you talking about?

08:38 <nikunj97> x and y

08:38 <nikunj97> how can we assume it to be inside the register already?

08:39 <heller1> since they are loop variables

08:39 <nikunj97> so 3 loads and 1 store?

08:39 <heller1> yes

08:40 <nikunj97> I see. If the cache is large enough to have 3 rows, we can get past with 1 load and 1 store

08:40 Abhishek09 has quit [Remote host closed the connection]

08:41 <heller1> yes, the problem however, is the distance between y+1, y and y-1

08:41 <nikunj97> yes

08:41 <nikunj97> for that 3*Nx*size of index should not be any larger than cachesize

08:41 <nikunj97> in an ideal case scenario, should not be larger than cachesize/2

08:42 <nikunj97> before all that, let me go ahead assuming 3 loads and 1 store

08:42 <heller1> those are given by the problem you want to solve, you need to massage your algorithm to account for that

08:42 <heller1> not change the problem size ;)

08:43 <nikunj97> yes, understood :)

08:43 <nikunj97> roofline makes the analysis easier

08:43 <nikunj97> also 1 Lattice site update involves 4 floating point operations

08:44 <nikunj97> so we can use that conversion to account for our performance in FLOPS/s or LUPS/s

08:46 <nikunj97> so right now we have 4 FLOP per 32 Byte (4 load/store * 8B per instruction)

08:46 <nikunj97> so that will be 1/8 for arithmetic intensity

08:47 <nikunj97> now I need to measure the performance of my application and plot that in graph. Following which I can see if I need better caching or better vectorized code. Right heller1?

08:47 <heller1> yes

10:04 Hashmi has joined #ste||ar

10:06 Abhishek09 has joined #ste||ar

10:15 nikunj97 has quit [Ping timeout: 246 seconds]

10:19 nikunj97 has joined #ste||ar

10:36 Abhishek09 has quit [Remote host closed the connection]

10:36 <nikunj97> heller1, running your stencil application in a single core setting. I see about the performance at 600 MPLUPS/s

10:37 <nikunj97> since the arithmetic intensity is 1/8 and the peak bandwidth is 39GB/s, we should get about 5GFLOPS/s worth of performance

10:37 <nikunj97> or about 1,250MLUPs/s

10:37 <heller1> what parameters did you run it with?

10:38 <nikunj97> ./stencil_parallel_1 -t1

10:38 <nikunj97> I let Nx and Ny be default at 1024

10:38 <nikunj97> and steps at 100

10:39 Abhishek09 has joined #ste||ar

10:39 <nikunj97> everthing is measured in doubles

10:39 <nikunj97> and not floats

10:40 <heller1> I think you need a larger problem size

10:40 <nikunj97> should I increase task grain size or the number of tasks?

10:43 <heller1> Larger dimensions, iirc

10:43 <nikunj97> in the paper you use 10,000 x 100,000

10:43 <nikunj97> let me try using them

10:48 <heller1> yes

10:48 <heller1> also, try the last parallel version

10:49 <heller1> stencil_parallel_4

10:51 <nikunj97> stencil_parallel_4 is the distributed one

10:51 <heller1> yes

10:52 <nikunj97> ok, let me run that one. stencil_parallel_1 reported 631MPLUPS

10:52 <nikunj97> is that an expected performance?

10:52 <nikunj97> I think simd should help with the performance in this case

10:53 <nikunj97> we're at half the peak performance for the arithmetic intensity. So in theory we can get about 2x speedup

10:54 <nikunj97> stencil_parallel_4 also reports ~630MLUPS (629.123 to be precise)

10:55 <nikunj97> using avx2 in haswell we can store 4 doubles. That should be enough to achieve the 2x speedup I guess

10:56 <nikunj97> once we get to 2x speedup, I will then need to look into ways to increase the arithmetic intensity by employing caching

10:57 <nikunj97> that looks like a good start to me. Do you concur?

10:57 <heller1> sounds good

10:57 <nikunj97> great \o/

10:57 <heller1> what does stencil_serial report?

10:58 <heller1> also, using avx2 will potentially give you a 4x speedup ;)

10:58 <nikunj97> I have not checked

10:58 <heller1> please do

10:58 <nikunj97> heller1, yes "potentially" 4x

10:59 <nikunj97> but given that we're already at 1/2 the peak performance, without using caching strategies, I don't think I'll be able to get the 4x speedup. So I'm aiming for that 2x speedup first

10:59 <heller1> first, try compiling with -ffast-math and -mavx2

10:59 <heller1> to see what this gives you

10:59 <nikunj97> ok

11:00 <nikunj97> stencil_serial reports 955MLUPS

11:00 <heller1> ;)

11:00 <heller1> so there's quite some overhead with the parallel for loop

11:00 <nikunj97> looks like it

11:00 <heller1> I hope you set CMAKE_BUILD_TYPE=Release?

11:00 <nikunj97> yes

11:01 <nikunj97> Debug can't give this high performance, can it ;)

11:02 <nikunj97> I made the build type mistake last year while benchmarking resiliency. Was pretty embarrassing when Hartmut told this to Keita

11:02 <heller1> :P

11:02 <nikunj97> the only way to hide parallel_for_loop's overhead would be to increase the grain size

11:03 <nikunj97> perhaps Nx=50,000 should hide the overheads

11:04 <heller1> try and see

11:04 <nikunj97> already started :)

11:22 nikunj97 has quit [Ping timeout: 260 seconds]

11:33 nikunj97 has joined #ste||ar

11:34 gonidelis has joined #ste||ar

11:43 <nikunj97> why does my application not link to -lboost_program_options when boost is definitely in the LD_LIBRARY_PATH :/

11:43 <nikunj97> module is loaded and cmake can see it, but g++ can't

11:46 <nikunj97> interesting... ldconfig -p does not show boost_program_options in there

11:47 <zao> nikunj97: LD_LIBRARY_PATH is for the loader. LIBRARY_PATH is for the linker.

11:47 <nikunj97> just realized, hpx doesn't need boost program options any more

11:47 <zao> Are you not using FindBoost and the targets/variables it produces for libraries?

11:48 <nikunj97> `echo $LIBRARY_PATH` doesn't return anything :/

11:48 <nikunj97> I'm essentially using g++ my_file.cpp -lboost_program_options

11:48 <nikunj97> LD_LIBRARY_PATH already has the boost path

11:49 <zao> CPATH and LIBRARY_PATH are the environment variables corresponding to the compiler's -I and the linker's -L.

11:49 <nikunj97> module list shows me that boost 1.72 is loaded

11:49 <nikunj97> ahh!

11:49 <zao> prepend_path("CPATH","/hp/eb/software/Boost/1.71.0-gompi-2019b/include")

11:49 <zao> prepend_path("LIBRARY_PATH","/hp/eb/software/Boost/1.71.0-gompi-2019b/lib")

11:49 <zao> prepend_path("LD_LIBRARY_PATH","/hp/eb/software/Boost/1.71.0-gompi-2019b/lib")

11:49 <zao> This is what my Boost/1.71.0 module sets, for example.

11:50 <nikunj97> why has it not been set in my case then :/

11:50 <nikunj97> I should ask Ali when he's around

11:50 <zao> Not all module systems set them.

11:50 <nikunj97> let me try this

11:50 <zao> There might be an environment variable for you to locate your Boost directory.

11:50 <zao> In my world, I'd say "-L${EBROOTBOOST}/lib" for example

11:51 <nikunj97> I see

11:52 <zao> If you're using Lmod, you can 'ml show' on your module name to see what it performs when loaded.

11:52 <zao> In my world - https://gist.github.com/zao/475ea2fc983623d03c3178acad737bf8

11:52 <nikunj97> `env | grep boost`, boost only assigns to LD_LIBRARY_PATH

11:52 <nikunj97> yes lmod is used

11:53 <nikunj97> ml boost doesn't prepend LIBRARY_PATH

11:53 <nikunj97> let me prepend that myself then

11:54 <nikunj97> zao, worked like a charm

12:02 baocvcv has joined #ste||ar

12:19 <nikunj97> heller1, I forgot that I was running them on head node :P

12:19 <nikunj97> on a haswell node, serial runs at about 630MLUPS

12:20 <nikunj97> and parallel runs at about parallel runs at about 510MLUPS

12:21 <zao> nikunj97: Running code on head nodes? *shudder*

12:22 <nikunj97> zao, that cluster is kind of exclusive to us ;)

12:22 <nikunj97> so there aren't many people accessing that one

12:22 <nikunj97> I tend to forget not to run things on head node :P

12:23 hkaiser has joined #ste||ar

12:56 gonidelis has quit [Ping timeout: 240 seconds]

12:59 baocvcv has quit [Ping timeout: 264 seconds]

13:03 Abhishek09 has quit [Remote host closed the connection]

13:05 Abhishek09 has joined #ste||ar

13:22 <nikunj97> hkaiser, how much of an overhead does parallel_for_loop have? I presume it's about 1-2us

13:22 <heller1> the results are good

13:22 <heller1> now on to vectorization ;)

13:23 <hkaiser> nikunj97: ~1 us per created thread

13:23 <hkaiser> amortized over the number of cores

13:24 <nikunj97> alright, so I need about 50-100us to hide them in noise

13:24 <hkaiser> yah, better 200us

13:24 Sarthakag has joined #ste||ar

13:25 <nikunj97> 200us for a stencil will require ample for loops ;)

13:25 Sarthakag has quit [Remote host closed the connection]

13:25 <nikunj97> heller1, yes I think so too!

13:25 <nikunj97> hkaiser, ample Nx for loops*

13:25 Sarthakag has joined #ste||ar

13:27 <hkaiser> nikunj97: why parallelize in the first place if you don't have 200us of work?

13:27 <nikunj97> not that I don't have work

13:27 <nikunj97> but the naive 2d stencil approach does 1 row at a time with neighboring top and bottom row

13:28 <nikunj97> and iterate over the row

13:28 <nikunj97> unless the row length, is large enough. I don't think I can get 200us of work in that task :/

13:30 <nikunj97> a for loop with 10,000 runs takes about 25-30us

13:31 <hkaiser> nikunj97: let me ask again: why parallelize a piece of work that is not large enough?

13:31 <hkaiser> if your rows are too short then parallelize over the rows and not each row on its own

13:31 <heller1> nikunj97: not every row is one task

13:31 <heller1> nikunj97: you have multiple rows in one task

13:33 <nikunj97> heller1, https://github.com/STEllAR-GROUP/tutorials/blob/master/examples/03_stencil/stencil.hpp#L43

13:34 <heller1> so?

13:35 <nikunj97> is it working on the whole stencil?

13:36 <nikunj97> I believe it's only working on a row

13:38 <heller1> yes, this function only works on one row

13:39 <heller1> but not every element function of the for loop corresponds to one task

13:40 <hkaiser> heller1: btw, with the latest changes to get<T>(variant), gcc fails again (https://cdash.cscs.ch//viewBuildError.php?buildid=100586) :/

13:40 <nikunj97> heller1, aah! understood. Give it more than one row to increase the work. Sensible enough

13:41 <heller1> hkaiser: why is it trying to call tuple_element on a variant to begin with?

13:42 <hkaiser> gcc's std::visit implementation relies on a (unqualified) get(v)

13:42 <K-ballo> that sounds wrong

13:42 <hkaiser> sure it's wrong

13:44 <hkaiser> I think I know what's missing

13:44 <K-ballo> all get in libstdc++ 9.1.0 look qualified

13:44 <hkaiser> we insert or get into namespace std for the tuple compatibility, I need to do the same with the get(variant) I added

13:44 <hkaiser> then our overloads will be found and used

13:45 <K-ballo> we add get to std?? that's definitely wrong

13:45 <hkaiser> we do that to enable std::get<> for our tuples

13:45 <K-ballo> ok, that's wrong

13:45 <hkaiser> shrug

13:45 <K-ballo> that explains why visit is calling it

13:45 <hkaiser> right

13:46 <K-ballo> doesn't explain what calls visit in the first place

13:46 <hkaiser> our serialization of std::variant

13:47 <K-ballo> our gets, those we wrongly inject into std... they are unconstrained :|

13:47 <hkaiser> yah

13:47 <hkaiser> blame heller1 ;-)

13:47 <K-ballo> lol

13:47 <heller1> blame the reviewers :P

13:47 <K-ballo> no reviewer ever said "we'll regret this"?

13:48 <hkaiser> you did ;-)

13:48 <K-ballo> I'd expect it

13:48 <hkaiser> ok, I'll redo our gets

13:48 <K-ballo> so.. those gets need some constraining

13:48 <K-ballo> but at the same type, those gets need to be unconstrained.. mmh

13:49 <hkaiser> not really, we can create specializations for all types we want to support

13:50 <K-ballo> as long as it is for the std:: injected ones, and not for hpx::util ones

13:51 <hkaiser> we can add specializations for util::tuple as well

13:56 baocvcv has joined #ste||ar

13:56 <Sarthakag> Hi, I am Sarthak, a 4th year student studying at BITS, Pilani. I would like to contribute to STE||AR. Although I have been an active programmer in C++ since the last 4 years, I am a beginner in the field of HPC. How can I get some hands on experience?

14:03 rtohid has joined #ste||ar

14:04 Hashmi has quit [Quit: Connection closed for inactivity]

14:06 khuck has joined #ste||ar

14:11 khuck_ has joined #ste||ar

14:12 khuck has quit [Ping timeout: 260 seconds]

14:15 diehlpk has joined #ste||ar

14:15 diehlpk has quit [Changing host]

14:15 diehlpk has joined #ste||ar

14:17 khuck_ has quit []

14:17 khuck has joined #ste||ar

14:20 bita has joined #ste||ar

14:21 Hashmi has joined #ste||ar

14:21 <Sarthakag> Is this the correct platform to ask such doubts?

14:21 <zao> Hi there!

14:22 <zao> Is your interest as a general researcher or as part of Google Summer of Code?

14:22 <Sarthakag> Google Summer of Code

14:23 <nikunj97> heller1, btw what's the use of hpx::compute::host::block_allocator?

14:23 <heller1> nikunj97: it distributes the allocated memory across all numa domains

14:24 <nikunj97> why did you have to use hpx::compute::vector over a std::vector? You can give the allocator to std::vector as well

14:25 weilewei has joined #ste||ar

14:25 <heller1> yes, compute::vector uses an extension of the allocator which does parallel construction of the elements

14:26 <heller1> exploiting the first touch numa policy

14:26 <nikunj97> aah! so in hindsight element construction within the vector occurs in parallel using numa

14:26 <nikunj97> yes

14:26 <heller1> this is not as easy with a std::vector

14:29 Sarthakag has left #ste||ar [#ste||ar]

14:30 Sarthakag has joined #ste||ar

14:43 <hkaiser> K-ballo: so I think we need a constrained version of get to util::tuple which is injected int std:: and hpx::util:: and an unconstrained version of get that is injected only into hpx::util::, the latter for the types we want to support from std::

14:54 <K-ballo> possibly, yeah, the two will need different constrainings

14:54 <hkaiser> yes

14:55 <K-ballo> the util one could be constrained on non-empty util::tuple_element<decay<Tuple>>>

14:55 <K-ballo> the std:: one will have to be a lot more conservative

14:55 <hkaiser> just for util::tuple

14:56 nan1 has joined #ste||ar

15:03 <Abhishek09> nikunj97 Where does hpx installed by dnf package . location?

15:04 <nikunj97> Abhishek09, /usr/

15:04 <nikunj97> libraries at /usr/lib64

15:04 <nikunj97> headers at /usr/include

15:06 <Abhishek09> Thanks i got it

15:07 akheir has joined #ste||ar

15:07 <Abhishek09> nikunj97 : how to install deps of pylanx because i dont have hpxconfig.cmake file?

15:07 <hkaiser> Abhishek09: dnf the hpx-devl package

15:07 <nikunj97> install hpx-devel :)

15:07 <hkaiser> hpx-devel

15:08 <zao> Abhishek09: It's customary to split distribution packages into two parts, one with binaries and libraries for running software, and one with files only for development like headers and build system metadata like pkg-config files or CMake exports.

15:11 <Abhishek09> What is hpx-devel ? nikunj97 . How it is differennt from hpx?

15:12 <nikunj97> Abhishek09, zao just explained that

15:12 <nikunj97> hpx-devel essentially contains files relating to build system

15:12 <nikunj97> example pkg-config and cmakefiles

15:12 <nikunj97> one of which is hpxconfig.cmake

15:12 <nikunj97> the hpx package ONLY installs the headers and libraries

15:14 <zao> nikunj97: Not even headers.

15:14 <hkaiser> nikunj: dnf hpx should only install the binaries etc. the headers should not be needed for it, actually

15:14 <nikunj97> ohh, my bad. I thought headers were also installed.

15:15 <hkaiser> K-ballo: wouldn't the currently unconstrained implementation of get be sfinaed out if tuple_element is not defined?

15:17 <K-ballo> tuple_element is not supposed to be sfinae-friendly

15:18 weilewei has quit [Remote host closed the connection]

15:18 <hkaiser> I mean the get implementation

15:18 <K-ballo> it may work for non cv-qualified non tuples? it's not supposed to

15:18 <K-ballo> tuple_element is not supposed to be sfinae-friendly, so get shouldn't sfinae out on it.. it'd need it's own sfinae

15:18 <hkaiser> ok

15:19 Sarthakag has quit [Ping timeout: 240 seconds]

15:19 Abhishek09 has quit [Ping timeout: 240 seconds]

15:23 Abhishek09 has joined #ste||ar

15:33 diehlpk_work has joined #ste||ar

15:33 nan has joined #ste||ar

15:34 nan is now known as Guest58090

15:34 Guest58090 has quit [Remote host closed the connection]

15:34 nan2 has joined #ste||ar

15:36 khuck has quit [Remote host closed the connection]

15:37 <Yorlik> hkaiser: How is the parloop executed in parallel if I'm not using "hpx::parallel::execution::task " ?

15:37 Abhishek09 has quit [Ping timeout: 240 seconds]

15:38 <Yorlik> Still tasks, but waiting in the local thread for them to finish?

15:41 baocvcv has quit [Ping timeout: 250 seconds]

15:44 <Yorlik> NVM - I think I got it - it's about the parloop as a whole, not the chunks - otherwise it wouldn't make sense anyways.

15:48 Abhishek09 has joined #ste||ar

15:48 khuck has joined #ste||ar

15:48 <hkaiser> Yorlik: par does parallelization, hense the name

15:48 <hkaiser> Yorlik: par(task) does parallelization, but also executes the algorithm asynchronously (makes it return a future)

15:48 <Yorlik> I think I confused launching the entire loop call async or just the chunks being async

15:49 <Yorlik> I moved my futures up a level in the call stack and thus couzld keep the executor and stuff around

15:49 <Yorlik> Still fighting with soring the Lua State for the Thread in a static map - it needs mutexes which is ugly as hell.

15:49 <Abhishek09> nikunj97: after dnf hpx-devel where did find hpxconfig.cmake file?

15:50 <nikunj97> Abhishek09, /usr/lib64/cmake/HPX

15:52 <Abhishek09> nikunj97 Yes , it is there

15:52 <nikunj97> you can use that while installing phylanx

15:57 <Abhishek09> nikunj97 : ste||ar has two dnf package `hpx` & `hpx-devel` Am i right?

15:58 <Abhishek09> zao ^?

15:58 <zao> Yup.

15:59 <zao> You can see the contents of installed packages if you want with for example "rpm -qvl hpx"

16:01 <Abhishek09> @zao Does dnf has blaze ppackge?

16:01 <Abhishek09> https://bodhi.fedoraproject.org/updates/FEDORA-2018-b9ad0bfe02 . i found something

16:01 <hkaiser> Yorlik: use a thread_local container

16:03 <Yorlik> hkaiser: But if the task migrates? I need to give back the LuaStates when the task is done and can't access that task specific slot then.

16:04 <Abhishek09> Does blaze-builder have role in building anything diehlpk

16:04 <hkaiser> no problem, yu're not freeing things but caching the pointers for the next use - so they can 'go back' to a different pool

16:05 <zao> Abhishek09: Ask dnf :)

16:05 <diehlpk_work> Abhishek09, What is blaze builder?

16:05 <zao> Some names are conflicting between different software packages. It could be a disjoint software.

16:07 <Yorlik> hkaiser: That's what I already did before, using 4 pools, one per worker thread. But I just detected another issue which might be responsible for the LuaState explosion. Need to investigate ...

16:07 <zao> You're not going to find all the dependencies in the distro repos.

16:08 <Abhishek09> @zoa blaze also have dnf https://bodhi.fedoraproject.org/updates/FEDORA-2018-b9ad0bfe02

16:08 <Abhishek09> maintained by diehlpk_work

16:09 <Yorlik> hkaiser: Should I use HPX specific mutexes if I need to lock? Or is std::mutex okay?

16:09 <Abhishek09> zao but not working

16:10 <zao> Yorlik: An OS primitive will block the thread. A HPX primitive will yield for other work.

16:10 <diehlpk_work> Abhishek09, Do you speak about my fedora package for blaze?

16:10 <hkaiser> Yorlik: yes, you're running on a hpx thread

16:10 <Abhishek09> Yes diehlpk_work

16:11 <Yorlik> Which mutex should I use?

16:11 <Abhishek09> it works or not ?

16:11 <Abhishek09> diehlpk_work

16:11 <zao> If you don't use HPX primitives, you're likely to halt progress if you're relying on something HPX:y to unblock you.

16:11 <Yorlik> Just for standard lock_guards

16:11 <diehlpk_work> Abhishek09, I do not udnerstand what you will do with the fedora spec file

16:12 <Yorlik> The waiting times are extremely low, but I have a weird feeling the state explosion has to do with mutual locking and might run in a deadlock even

16:12 <Abhishek09> I will install blaze diehlpk_work

16:12 <diehlpk_work> It is very specific to build a package for Fedora

16:24 Hashmi has quit [Quit: Connection closed for inactivity]

16:25 Abhishek09 has quit [Remote host closed the connection]

16:33 Abhishek09 has joined #ste||ar

16:36 <Abhishek09> diehlpk : dnf install blaze works or not?

16:37 <nikunj97> Abhishek09, https://fedora.pkgs.org/31/fedora-x86_64/blaze-devel-3.5-2.fc31.x86_64.rpm.html

16:37 <zao> Abhishek09: Did you _try_?

16:37 <zao> If it's in the repositories, it's supposed to work.

16:38 <Abhishek09> zao: i tried blaze but not found

16:38 <Abhishek09> dnf install blaze

16:39 <zao> nikunj97: Now I feel silly... I've been ssh:ing to my old laptop running Fedora every time I needed to test something here, and it takes ages to get to a prompt.

16:39 <zao> Turns out that IPv6 is down on it and it was busy timing out before trying IPv4 :D

16:39 <nikunj97> lmao

16:39 <zao> Abhishek09: Which OS release are you on now again?

16:40 <nikunj97> zao, is your laptop that old?

16:40 <nikunj97> I don't think IPv6 is disable in recent models when you install any os

16:40 <Abhishek09> zao fedora

16:40 <zao> It's an old EeePC I repurposed to a small home server, so it's very low on RAM.

16:40 <nikunj97> Abhishek09, blaze-devel exists

16:40 akheir has quit [Read error: Connection reset by peer]

16:40 <zao> Abhishek09: I'd like to know which version.

16:41 * nikunj97 zao trying to build hpx on low RAM

16:41 <Abhishek09> blaze does not exist

16:41 <zao> Abhishek09: And yes, recall again how there's typically two packages for software, one regular for running things and one -devel for building things.

16:41 <zao> Blaze is header-only and as such, doesn't have the former.

16:41 akheir has joined #ste||ar

16:41 <Yorlik> Is mutex from <hpx/synchronization/mutex.hpp> fully compatible with std::mutex and can it be used with std::lock_guard?

16:41 <zao> Yorlik: Pretty much anything can be BasicLockable.

16:42 <zao> https://en.cppreference.com/w/cpp/named_req/BasicLockable

16:42 <Yorlik> zao: IC - thanks!

16:42 <zao> nikunj97: It'd be foolish to build HPX on 1GB of RAM.

16:43 <nikunj97> zao, I have 16gb ram. It feels low while building hpx :/

16:43 <zao> For amusing reasons the machine has public IPv4 over wired and LAN+IPv6 over wifi. Wifi's down apparently.

16:43 <nikunj97> aah

16:44 <hkaiser> Yorlik: use hpx::lcos::local::spinlock

16:44 <Yorlik> Does that yield ?

16:44 <hkaiser> yes, eventually

16:44 <Yorlik> Nice.

16:44 <Yorlik> So it does a compromise?

16:44 <hkaiser> I still think you can get away with a thread_local pool of lua engines

16:45 <Yorlik> The pool works without a single lock

16:45 diehlpk has quit [Ping timeout: 260 seconds]

16:45 <Yorlik> Problem is sending messages and the mailboxes

16:45 Abhishek09 has quit [Ping timeout: 240 seconds]

16:45 Abhishek09 has joined #ste||ar

16:45 <hkaiser> Yorlik: ok, that's your problem, then

16:45 <Yorlik> I need the locks for message push_back and for mailbox take ownership

16:45 <hkaiser> nod, spinlock is the way to go

16:45 akheir has quit [Read error: Connection reset by peer]

16:46 <Yorlik> Thats one of the few points where I cannot avoid data sharing

16:46 <Yorlik> OK - thanks!

16:46 <hkaiser> if there is no contention it will be a single atomic op, otherwise it will spin for a while before yielding

16:46 <Abhishek09> zao: i didnt remember version simply run `docker run it fedora`

16:46 akheir has joined #ste||ar

16:46 parsa[m] has quit [Ping timeout: 240 seconds]

16:46 freifrau_von_ble has quit [Ping timeout: 240 seconds]

16:46 gdaiss[m] has quit [Ping timeout: 256 seconds]

16:47 simbergm has quit [Ping timeout: 256 seconds]

16:47 heller1 has quit [Ping timeout: 256 seconds]

16:52 <Abhishek09> zao: is there any dnf package for blazetensor - devel?

16:53 gdaiss[m] has joined #ste||ar

16:53 <zao> Abhishek09: As I said, not all dependencies have distro packages.

16:53 akheir1 has joined #ste||ar

16:53 <zao> blaze_tensor doesn't even have _releases_.

16:54 heller1 has joined #ste||ar

16:55 <zao> In this phase, you're going to be able to leverage distro packages for some of the bigger base dependencies, but definitely not all.

16:56 akheir has quit [Ping timeout: 256 seconds]

16:56 <Abhishek09> zao that means i have to cmake for blaze_tensor . now i will try `dnf pybind11-devel`

17:00 simbergm has joined #ste||ar

17:02 weilewei has joined #ste||ar

17:03 parsa[m] has joined #ste||ar

17:03 parsa[m] is now known as Guest295

17:04 <weilewei> a navie question about build and install software on Linux: if I don't issue `make` instruction, and just directly issue `make install`, will this instruction build software for me?

17:04 <nikunj97> yes

17:04 <nikunj97> make install is usually dependent on make

17:04 rtohid has quit [Remote host closed the connection]

17:05 Abhishek09 has quit [Ping timeout: 240 seconds]

17:05 <weilewei> ok, so it doesn't matter if I skip `make` or not if my intention is to install software

17:05 <weilewei> nikunj97 thanks!

17:05 <nikunj97> that's why when you run make install in hpx, it still builds it for you. If you have already done make, it still checks

17:05 freifrau_von_ble has joined #ste||ar

17:05 <nikunj97> yup

17:05 <nikunj97> make install will work

17:06 <weilewei> nice! nikunj97

17:15 <Yorlik> hkaiser: Is max_busy_loop_count concerning the spinlocks? After replacing all mutexes I had a ton of lua engines created, I think it's SendMessage waiting for access to a mailbox and backing off, bringing in the next possibly locked SendMessage. I need to slow down the backoff from SendMessage specifically - I'd not like to do this globally - However - Ideas?

17:15 <hkaiser> no, that's unrelated

17:16 <hkaiser> spinlock will backoff very quickly, don't think that's your issue

17:16 <Yorlik> I guess with my excessive message test I am creating a gazillion tasks in flight

17:16 <Yorlik> But I might have a logical error elsewhere ofc.

17:22 Abhishek09 has joined #ste||ar

17:24 <Abhishek09> nikunj97: Where does pybind11 cmake file reside by installing them by dnf ?

17:24 <nikunj97> never installed pybind with dnf

17:25 <nikunj97> should be somewhere in /usr/lib/cmake or /usr/share/cmake

17:28 nan2 has quit [Ping timeout: 240 seconds]

17:28 nan1 has quit [Ping timeout: 240 seconds]

17:39 Abhishek09 has quit [Remote host closed the connection]

17:49 gonidelis has joined #ste||ar

17:55 <gonidelis> Seeking advice: As I am interested both in your 'Implement missing Parallel Algorithms ' and 'Range based Parallel Algorithms' projects do you believe that there is a way to combine them in somehow or should I just form two separate proposals.

17:56 <hkaiser> gonidelis: not sure

17:56 <hkaiser> gonidelis: both have sufficient work for a year ;-)

17:57 <hkaiser> I'd write a solid but more conservative proposal

17:57 <hkaiser> if you're done early you can always do more afterwards

17:57 <zao> hkaiser: That Sarthakag lad that came around a few hours ago, they were interested in one of your GSoC projects btw. I guess they'll either be mailing you or poke you on IRC some day.

17:58 <gonidelis> Great! Thank you sir

17:58 <hkaiser> zao: thanks

17:59 <hkaiser> gonidelis: in general, I think it's more important to have a proposal that shows you have thought it through than to have one that lists a lot of work that is not realistic to finish

18:00 <gonidelis> hkaiser Sure, I totally agree. HPX is all I have done the past three weeks so I believe I will be able to provide you with a solid solution.

18:01 <hkaiser> good, looking forward to reading your proposal, feel free to share it before submission

18:02 <gonidelis> Yeah, your review will be vital actually

18:02 <nikunj97> heller1, what's your github handle?

18:03 <diehlpk_work> hkaiser, SC workshop was rejected

18:03 <diehlpk_work> As usual for HPX

18:03 <hkaiser> darn

18:03 <hkaiser> don't worry about it, SC will not happen anyways this year

18:04 <diehlpk_work> Yeah, but we have inclusion again ;)

18:04 <hkaiser> nikunj97: it's sithhell

18:04 <nikunj97> heller1, I completed porting your 2d stencil into my version. It's on node as of now, but I will have a distributed version by this weekend. Initial results look promising as well. SIMDized kernel version works much faster this time.

18:04 <nikunj97> hkaiser, thanks!

18:04 <hkaiser> diehlpk_work: what 'inclusion'?

18:04 <diehlpk_work> Diversity and inclusion are really important to the SC community. Diversity is not just gender and organizational diversity, but should also include a plan to recruit ethnic minorities for attendees and participants/organizers in the workshop.

18:04 <nikunj97> diehlpk_work, what do you mean rejected coz of HPX?

18:04 <hkaiser> what a BS

18:05 <diehlpk_work> nikunj97, All things containing HPX were rejected from SC workshops, BOF, and panels

18:05 <nikunj97> ehh, why would they do this?

18:05 Abhishek09 has joined #ste||ar

18:05 <hkaiser> nikunj97: just because

18:06 <hkaiser> we're perceived as a threat nowadays

18:06 <nikunj97> that has a nice ring to it ;)

18:06 <hkaiser> heh

18:07 <hkaiser> it's just annoying

18:07 <Yorlik> You're kicking everyones back with HPX - lol.

18:07 <Yorlik> There is nothing comparable on the market afaik.

18:08 <Abhishek09> nikunj97 Did we want to want to install hpx and other deps of phylanx or just uses important files needed to install phylanx?

18:08 <hkaiser> Yorlik: thanks for the encouragement

18:08 <gonidelis> Who are these guys that sabodage HPX? Why do they do that?

18:08 <nikunj97> hkaiser, well it feels nice to be a part of a group who're feared in the community is what I meant ;)

18:08 <nikunj97> not the rejection part though :/

18:08 <hkaiser> Abhishek09: in the end you want to have a functioning Phylanx, that's it

18:09 <hkaiser> nikunj97: sure

18:09 <zao> gonidelis: Academia is a brutal place :D

18:09 <gonidelis> But why....

18:10 <zao> Some people have strong investment or feelings about how parallel software and communication should be done.

18:10 <Abhishek09> hkaiser nikunj97 that means no matter whetther deps work or not. Am i right?

18:10 <hkaiser> gonidelis: the usual... money, reputation, perceived status, you name it

18:10 <hkaiser> Abhishek09: if you can make Phylanx work with making its dependencies work - sure

18:10 <hkaiser> I however doubt that this will be viable

18:10 <Yorlik> hkaiser: I've done a lot of reading and testing before I decided to use HPX. We considered writing our own task system. Fortunately it wasn't necessary - HPX gave us all we need. My only problem is learning it and C++. But I can't complain, seriously not.

18:10 <hkaiser> :D

18:11 <gonidelis> Do you have any anatagonists?

18:11 <hkaiser> Abhishek09: I meant *without* making dependcies work

18:11 <hkaiser> gonidelis: the whole community ;-)

18:11 <Abhishek09> hkaiser Please elaborate!

18:12 <nikunj97> I feel the same. HPX was the first thing I learnt and now when I try writing code with other libraries, I only feel claustrophobic

18:12 <hkaiser> Abhishek09: I doubt you can make Phylanx work if its dependencies don't

18:12 nan22 has joined #ste||ar

18:12 <zao> There's people doing HPC traditionally, and there's this gang.

18:12 <nikunj97> HPX gives a lot of independence with what you can do

18:12 <zao> Futurization is the future :P

18:12 <hkaiser> there you go!

18:12 <Yorlik> MPI sucks HPXs hairy balls ....

18:12 <hkaiser> now now

18:13 <hkaiser> Yorlik: we have ids on this channel after all

18:13 <hkaiser> *kids*

18:13 <Yorlik> Woops. Couldn't help it. Sorry.

18:13 <nikunj97> :D

18:13 <zao> I'll have to revise my motto to "HPX - the best library that at least one person showed can be used".

18:14 <hkaiser> zao: it's John's motto, isn't it

18:14 <Yorlik> Hkaiser: You need to write the book. For real :)

18:14 <hkaiser> no way

18:15 <nikunj97> heller1, I've added you to my repository of simdized kernels so that I can show you code whenever I have performance related doubts. Please accept the request :)

18:15 <diehlpk_work> HPX books :)

18:15 <hkaiser> Yorlik: have you seen heller1's thesis?

18:15 <Abhishek09> hkaiser : i only carry that files of dependecies which are necessary for installation of phylanx For eg- hpx-config.cmake

18:15 <hkaiser> it's as close to a HPX book as you can get

18:15 <Yorlik> Yes. Especially the benchmarks :D

18:15 <nikunj97> I would appreciate an HPX book

18:16 <hkaiser> nikunj97: read heller1's thesis

18:16 <nikunj97> hkaiser, is heller1's thesis public?

18:16 <nikunj97> I'll surely give it a read

18:16 <hkaiser> sure

18:16 <Abhishek09> hkaiser?

18:16 <hkaiser> nikunj97: can't find it howver right now - ask him

18:17 <nikunj97> sure, will do

18:17 <hkaiser> we need to link it from our publications page

18:17 <Yorlik> +1^^

18:17 <weilewei> nikunj97 https://www.researchgate.net/publication/333419992_Extending_the_C_Asynchronous_Programming_Model_with_the_HPX_Runtime_System_for_Distributed_Memory_Computing

18:18 <nikunj97> weilewei, thanks!

18:18 <Abhishek09> nikunj97:i only carry that files of dependecies(shared lib) which are necessary for installation of phylanx For eg- hpx-config.cmake Am i right ?

18:18 <Abhishek09> nikunj97^

18:19 <nikunj97> you don't need to copy hpx-config to make phylanx work

18:19 <nikunj97> simply linking phylanx with hpx should be fine

18:20 karame78 has joined #ste||ar

18:21 <Abhishek09> No, i m telling no need to build hpx binaries , only hpx built files will be fine nikunj97

18:22 <Abhishek09> such as config.cmake files, then we link with shared libs with pip tools

18:23 <Abhishek09> setuptools

18:23 <nikunj97> config.cmake tells you where to find libraries

18:23 <nikunj97> they don't do linking etc.

18:25 <zao> Abhishek09: You've got two different sets of things you need. One wider set to build Phylanx, and a narrower set to use Phylanx.

18:31 Hashmi has joined #ste||ar

18:40 rtohid has joined #ste||ar

18:42 gonidelis has quit [Remote host closed the connection]

18:42 weilewei has quit [Remote host closed the connection]

18:44 gonidelis has joined #ste||ar

18:55 rtohid has quit [Remote host closed the connection]

18:58 Abhishek09 has quit [Remote host closed the connection]

19:06 <hkaiser> Yorlik: http://stellar-group.org/publications/#thesis

19:07 <Yorlik> It's been on my harddrive for months already - and been read :)

19:09 <hkaiser> linking it here for posterity

19:10 <Yorlik> IC

19:10 <Yorlik> How can I solve these dreaded HPX/Bosst issues with winsock.h?

19:10 <hkaiser> include hpx first

19:10 <Yorlik> Since I replaced mutex with spinlock it came up in one component

19:10 <Yorlik> If I include HPX first boost asio complains

19:11 <hkaiser> about what?

19:11 <Yorlik> Is asio incompatible?

19:11 <hkaiser> no

19:11 <Yorlik> C:\__A\Arc_Sb\_INSTALL\boost\boost-1.72.0\RelWithDebInfo\include\boost-1_72\boost\asio\detail\socket_types.hpp(24): fatal error C1189: #error: WinSock.h has already been included

19:11 <hkaiser> include hpx before windows.h

19:13 <Yorlik> hpx.hpp? or can it be smaller?

19:14 <hkaiser> hpx/config.hpp should do the trick (I think)

19:14 rod has joined #ste||ar

19:21 <Yorlik> OK - that was an include mess from months ago I had to fix here. :)

19:33 <diehlpk_work> Anyone with good linux skills around?

19:33 <diehlpk_work> akheir1, yet?

19:33 <nikunj97> zao ^^

19:34 <nikunj97> diehlpk_work, anything I can help with?

19:35 <diehlpk_work> find -maxdepth 1 -name "lecture1.tex" -exec latexmk -pdflatex="lualatex --shell-escape %O %S" --jobname=$(basename "{}" .tex)-slides -pdf "{}" ";"

19:35 <diehlpk_work> The part with latexmk does work if I use the file name directly

19:36 <diehlpk_work> find -maxdepth 1 -name "lecture1.tex" -exec latexmk -pdflatex="lualatex --shell-escape %O %S" --jobname=lecture1-slides -pdf "{}" ";"

19:37 <diehlpk_work> Howeber, if I use the basename and the result of find twice, it does not work

19:39 <nikunj97> diehlpk_work, no clue :/

19:40 nan22 has quit [Ping timeout: 240 seconds]

19:51 nan11 has joined #ste||ar

19:53 <nikunj97> Yorlik, you're making a game engine right?

19:53 <Yorlik> A Distributed Gameserver, yes

19:54 <nikunj97> nice! could you tell more about it?

19:54 <Yorlik> Sure. What do you want to know?

19:54 <Yorlik> We could do it in voice if you like - less typing.

19:55 <nikunj97> so you'll be writing a mulitiplayer game post this?

19:55 <Yorlik> That's the plan.

19:55 <nikunj97> woah! that's really cool

19:55 <Yorlik> Server authoritative with Lua Scripting.

19:56 <Yorlik> We use Sol for the Lua bindings.

19:56 <nikunj97> are you from a AAA production house?

19:56 <Yorlik> Concurrency management is done with the actor model.

19:56 <Yorlik> Nope - simple hobbyist.

19:57 <Yorlik> We were modding a game that went south and decided to leave it and start over with a full game.

19:57 <nikunj97> i see

19:57 <nikunj97> when can I expect the game? xD

19:57 <Yorlik> lol

19:57 <Yorlik> Yesterday?

19:57 <nikunj97> lol

19:58 <Yorlik> I'm close to reach internal milestone 1: A scriptable local simulation.

19:58 <Yorlik> Next steps are distributed, object migration and client connection.

19:58 <nikunj97> nice! how does a multiplayer game work essentially

19:58 <nikunj97> ?

19:58 <Yorlik> Uh ...

19:59 <Yorlik> People connect - the server handles it - they play?

19:59 <nikunj97> lol

19:59 <nikunj97> I meant what exactly happens at the server side

19:59 <Yorlik> After all a player is just an object controlled from the outside.

19:59 <Yorlik> When the connection drops we could just leave it in game

19:59 <Yorlik> Or log it out

19:59 <nikunj97> is the whole hosted on the server?

20:00 <Yorlik> The player is just a replacement for AI

20:00 <nikunj97> true

20:00 <Yorlik> The client is just like a dumb graphical terminal with a bit of intelligence to hide away latency.

20:00 <Yorlik> Any maybe some physics for visuals, but we will have a server side authoritative physics

20:01 <nikunj97> so the whole simulations occurs on a server?

20:01 <Yorlik> Starting with collision ofc.

20:01 <Yorlik> Yes

20:01 <Yorlik> The server is the world.

20:01 <nikunj97> I used to think that you simulate the world in the machine

20:01 <Yorlik> And inside this world are objects (HPC Components)

20:01 <nikunj97> and send over the daya

20:01 <nikunj97> *data

20:01 <Yorlik> Yes, that's the gist of it.

20:02 <nikunj97> nice

20:02 <nikunj97> I'd like to know someday how all of it is put to place

20:02 <nikunj97> this is pretty interesting stuff

20:02 <Yorlik> The world will be tiled and tiesl will be able to migrate together with the objects and connections they contain.

20:02 <Yorlik> You start small.

20:03 <Yorlik> My first goal was to get a basic simulation going and I'm close to that

20:03 <nikunj97> nice!

20:03 <Yorlik> We started about Nov. 18

20:03 <Yorlik> 2018

20:03 <Yorlik> So its a good year now

20:04 <Yorlik> We did a LOT of research at start, because we understood bad decisions will pay back harshly.

20:04 <nikunj97> yes, it has been long

20:04 <Yorlik> We have been modding 4 years before that

20:04 <nikunj97> modding games?

20:04 <Yorlik> So - the entir project is like 5 years now, with a major setback a good year ago

20:05 <Yorlik> We modded a game that died

20:05 <Yorlik> The company screwed it up horribly.

20:05 <nikunj97> damn, that's sad :/

20:05 <Yorlik> Yes it is.

20:05 <nikunj97> which gamehouse do you work for?

20:05 <Yorlik> We will be able to reuse all the design ideas and a bunch of Lua scripts though.

20:05 <nikunj97> is there any game I can play?

20:05 <Yorlik> I'm a hobbyist as I said

20:05 <Yorlik> Nothing playable yet.

20:06 <nikunj97> so there was a silver lining in all of this

20:06 <Yorlik> In the moment it's all about getting a usable simulation and scripting environment

20:06 <Yorlik> The silver lining is in my beard ;)

20:06 <nikunj97> :P

20:07 <Yorlik> It's fun. Can't wait to have this frst miulestone done.

20:07 <Yorlik> I want to make a simple grass-rabbit-fox population dynamics sim as a test game.

20:07 <nikunj97> does sound like fun

20:07 <Yorlik> The learning curve was and still is painful. But it's rewarding too.

20:08 <Yorlik> Once you have a graphical client it's a balst, since you have this instant feedback visually.

20:08 <nikunj97> true

20:09 <Yorlik> In the moment it's very abstract and not very tangible.

20:09 <Yorlik> Still a lot of fun.

20:12 <hkaiser> Yorlik: you'll have to make your game moddable, like Skyrim etc. ;-)

20:12 <nikunj97> hkaiser, you've played Skyrim?

20:12 <Yorlik> That is a major problem actually.

20:12 <hkaiser> success guaranteed!

20:12 <hkaiser> nikunj97: who has not?

20:13 <Yorlik> Skyrim is single player.

20:13 <Yorlik> If we want true moddability we would have to hand out the server.

20:13 <hkaiser> yah, distributed modding is a problem, I understand

20:13 <nikunj97> hkaiser, woah didn't know you played games

20:13 <Yorlik> I'd like to do at least things like UI modding and such

20:13 <hkaiser> nikunj97: I do nothing else ;-)

20:13 <nikunj97> I've only recently started playing Skyrim

20:13 <Yorlik> Programming = A fom of gaming.

20:14 <zao> Progaming? :P

20:14 <Yorlik> Procrastigaming.

20:14 <Yorlik> Just call is Progging :D

20:14 <Yorlik> You can read that any way.

20:15 nikunj97 has quit [Quit: Leaving]

20:16 nikunj97 has joined #ste||ar

20:22 <Yorlik> hkaiser: Is there a way to limit how many tasks are allowed to be in flight at once?

20:24 <Yorlik> I have the impression I am coming inbto a spiral of death, where more and more tasks get created, which need a lua state, but because of the creation of states they get swapped out and are not being worked on and instead more states for new tasks are created.

20:24 <Yorlik> Sometimes it is in an equilibrium and then, suddenly it starts creating Lua States like crazy.

20:25 <Yorlik> I'm pretty sure it is some sort of spiral of death, I just don't understand yet exactly how.

20:26 <Yorlik> I want to try pre-creating more states , so I can hzand them out more quick, still - I should never enter this spiral.

20:27 <jbjnr> Yorlik: limiting_executor might help you, but it needs some improvments. not got around to doing them

20:30 <Yorlik> jbjnr: What does a limiting executor exactly do?

20:34 rod has quit [Remote host closed the connection]

20:35 <Yorlik> Hmm ... #3734 - IC

20:43 <Yorlik> hkaiser - I'm currently looking at https://isocpp.org/files/papers/p0443r1.html - Is this the actual version of the propoal?

20:44 <Yorlik> It's the executor proposal

20:48 nan2 has joined #ste||ar

20:49 maxwellr96 has joined #ste||ar

20:57 diehlpk has joined #ste||ar

20:57 diehlpk has quit [Changing host]

20:57 diehlpk has joined #ste||ar

20:59 <heller1> The latest version is 11

21:00 <hkaiser> Yorlik: use wg21.link/p0443

21:00 <Yorlik> AH OK - Just r11

21:00 <hkaiser> that gives you the latest version

21:01 <Yorlik> I was thinking a lot about this idea of an executor as far as I understand it.

21:01 <Yorlik> (Very limited)

21:01 <heller1> 13 even...

21:01 <Yorlik> I am missing one thing, though it seems to be implicitely contained somehow.

21:01 <heller1> What you'll read in the latest version of the proposal isn't implemented in hpx yet

21:01 <Yorlik> And that is a means to describe the workload.

21:01 <heller1> What's missing?

21:02 <heller1> As in?

21:02 <Yorlik> The executor if I understand correctly describes somehow the ways of execution, and manages the execution resources.

21:02 <Yorlik> But workloads have different properties.

21:03 <Yorlik> Some of them are reflected in the executors I think.

21:03 <Yorlik> Like the trivial: Parallelizable (somehow)

21:03 <Yorlik> Or Must execute sequentially

21:03 <Yorlik> But there are gaziilions of properties and construct you could choose to decribe abstract properties of the problem.

21:03 <heller1> Like memory access, need for compute resources and the like?

21:04 <Yorlik> E.g. in my case it is a frame based ececution in timesteps.

21:04 <heller1> Ahhh

21:04 <Yorlik> I think there's a ton of stuff worth describing

21:04 <Yorlik> If the executor is the worker, the "workload" would be the work.

21:04 <Yorlik> Understanding the work is one of the most important tasks of a programmer s a problem solver.

21:05 <heller1> Yeah

21:05 <Yorlik> So - I ask myself - Do we have siffuctiontly good structures to describe a problem/workload on an abstract level?

21:05 <heller1> But those are different properties

21:05 <Yorlik> Do we have appropriate language?

21:05 <heller1> Your real time requirements need to be honored by the scheduler

21:05 <heller1> No, we don't have that yet

21:06 <Yorlik> That's basically my thought - acounterpart to the executor consept.

21:06 <Yorlik> The "worlkload" concept

21:06 <heller1> We're not even close to having consensus on the basic vocabulary

21:06 <Yorlik> And a workload must be met by an appropriate executor to be worked on in a satisfying way.

21:07 <Yorlik> And maybe it's not even possible, but I think it's worth trying.

21:08 <hkaiser> Yorlik: the problem with standards papers is that the later the revision you read the more standardeeze you they get and th eless concepts are described

21:08 <Yorlik> E.g. when I was designing my datastructures with these composable entities - I was stumbling over a ton of abstract properties which I could use to describe my specific workload.

21:08 <heller1> There are different dimensions to that problem

21:08 <hkaiser> it's a good idea always to read all revisions as that gives you the whole background

21:08 <Yorlik> I'll not go too deep, since it's a rabbithole for sure.

21:09 <Yorlik> But the problem in itself is intriguing: How can we speak about and then program problems?

21:09 <Yorlik> resp. solutions to problems.

21:10 <Yorlik> However - I'm procrastinating. My LuaState explosion awaits me and wants a solution - lol.

21:10 Hashmi has quit [Quit: Connection closed for inactivity]

21:11 nan2 has quit [Remote host closed the connection]

21:14 <heller1> https://www.sciencedirect.com/science/article/pii/S001046551930400X

21:15 <heller1> If anyone is interested in how hpx is used elsewhere

21:15 <heller1> At least under the covers

21:15 <zao> Heh, Erwin.

21:15 <heller1> ;)

21:15 <heller1> Isn't he your boss?

21:15 <zao> Different site.

21:16 <hkaiser> heller1: can you add that to the publication page?

21:16 <heller1> At, thought he was some kind of director of your collaboration

21:16 <heller1> Yes, please remind me again tomorrow

21:16 <hkaiser> I just did

21:16 <hkaiser> it's tomorrow for you already ;-)

21:17 <heller1> I guess we should add all allscale papers

21:17 <heller1> Nope

21:17 <hkaiser> almost

21:17 <heller1> Still two hours to go

21:17 <zao> There's six sites under SNIC, he's head honcho at one of them, PDC (KTH)

21:18 <heller1> An interesting person, learned quite a bit from him...

21:18 <hkaiser> heller1: if you send Katie an email she will be able to do that as well

21:18 <heller1> Ok

21:34 nan11 has quit [Remote host closed the connection]

21:56 <nikunj97> It feels so nice when you figure out the right template specializations \o/

21:58 nan22 has joined #ste||ar

22:05 diehlpk has quit [Ping timeout: 240 seconds]

22:10 shahrzad has joined #ste||ar

22:15 shahrzad has quit [Ping timeout: 256 seconds]

22:21 <nikunj97> heller1, yt?

22:22 <nikunj97> my code is running faster than yours (in terms of time), but the reported MLUPS is less

22:22 <nikunj97> for the same Nx, Ny and steps

22:22 <nikunj97> could you check if my formula is correct?

22:22 <nikunj97> https://github.com/NK-Nikunj/SIMD-kernels/blob/master/stencil_parallel.cpp#L70

22:23 <nikunj97> my bad, I forgot to multiply the damn thing by number of steps

22:23 <nikunj97> :P

22:25 <nikunj97> hkaiser, yt?

22:30 nan22 has quit [Ping timeout: 240 seconds]

22:50 <hkaiser> nikunj97: here

22:50 <nikunj97> mutliple rows per task has improved heller1's code by quite a bit ;)

22:51 <nikunj97> I'm getting about 1.5x speed up

22:51 <hkaiser> nice

22:51 <hkaiser> with or without vectorization?

22:51 <nikunj97> with simd employed at about 4x speed up

22:51 <hkaiser> 8 doubles wide?

22:51 <nikunj97> well it's 3.7x really

22:51 <nikunj97> no 4 double wide

22:52 <nikunj97> wait no

22:52 <nikunj97> 8 double wide

22:52 <nikunj97> I'm using floats

22:52 <hkaiser> ok, 4 doubles wide simd gives you a speedup of ~2.5, not bad

22:52 <nikunj97> no, 4 double wide by 8 floats. my bad

22:52 <nikunj97> using AVX2

22:52 <nikunj97> I haven't tried -ffast-math that heller1 asked

22:53 <hkaiser> nice work

22:53 <nikunj97> I will ask Rohit to benchmark the application for me and then I'll report you the numbers

22:54 <nikunj97> I feel that I can try to get a paper out of this, if the numbers compare to the optimized openmp codes

22:54 <hkaiser> absolutely

22:54 <nikunj97> or if it's somewhere near the expected peak

22:54 <hkaiser> nikunj97: try to get a dependency of speedup over grainsize

22:54 <nikunj97> *theoretical

22:55 <nikunj97> yes, that's the first aim. Find the right grain size for the fixed matrix size of 10,000 x 100,000 (the same grid size taken by heller1 )

22:55 <hkaiser> more of finding the right grainsize for parallelization, then the matrix size should not matter too much (within reason

22:56 <nikunj97> ohh yea, right

22:56 <hkaiser> akheir1: tell shahrzad that nikunj97is doing benchmarking for 2d stencils, that might interest her

22:57 <nikunj97> hkaiser, you want to look at the code?

22:57 <nikunj97> it's still on node, but I will be able to write the distributed one in a couple of days

22:57 <hkaiser> yes, I would like to, just have no time, currently - sorry

22:57 <nikunj97> that's fine.

22:58 <nikunj97> hkaiser, btw did you get time to go through the doc I sent in that email?

22:59 <hkaiser> uhh, what email?

22:59 <nikunj97> I sent you one a couple of days back

22:59 <nikunj97> asking your help in a guide to hpc that I'm preparing

22:59 <nikunj97> for my univ

22:59 <hkaiser> ahh

22:59 <hkaiser> the google docs

22:59 <nikunj97> yes

23:00 <hkaiser> not yet :/

23:00 <hkaiser> will do right away

23:00 bita has quit [Ping timeout: 260 seconds]

23:00 <nikunj97> thanks a lot!

23:00 <hkaiser> nikunj97: but I sent you that link to the hpc course, right?

23:00 <nikunj97> yes

23:00 <nikunj97> I've added that to the list as well

23:01 <nikunj97> I'll prepare a blog from the bullets I've collected. We'll publish it for the university students who are interested in the field but can't get into.

23:01 <hkaiser> nod

23:03 <hkaiser> nikunj97: no additional comments at this point

23:03 <nikunj97> does it look complete to you?

23:04 <nikunj97> it's been compiled by 2 of us. My friend worked with chapel last year in gsoc. he's added some points from his experiences as well

23:04 <hkaiser> this kind of thing is never complete ;-)

23:05 <nikunj97> haha right, I meant am I missing something that may catch a beginner's eye?

23:06 <hkaiser> it's a nice starting point

23:12 diehlpk has joined #ste||ar

23:12 diehlpk has quit [Changing host]

23:12 diehlpk has joined #ste||ar

23:16 shahrzad has joined #ste||ar

23:17 diehlpk has quit [Ping timeout: 240 seconds]

23:19 diehlpk has joined #ste||ar

23:24 diehlpk has quit [Ping timeout: 256 seconds]

23:31 shahrzad has quit [Ping timeout: 240 seconds]