hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020
shahrzad has joined #ste||ar
diehlpk has joined #ste||ar
shahrzad has quit [Ping timeout: 246 seconds]
<Yorlik> KK
<hkaiser> worked without problems this time
<hkaiser> just adding an example
<Yorlik> Does this work with your fix only?
<hkaiser> should do what you need
<Yorlik> After the fix?
<hkaiser> no, doesn't need my fix
<Yorlik> Cool - I'll check it out
<Yorlik> Thanks a ton !
<hkaiser> you can most likely just copy it over to your code base
<Yorlik> I'm on it right now
<Yorlik> Reading and trying to undrstand first :)
<Yorlik> Do I need to put it into the hpx namespace like you did?
<hkaiser> teh traits have to be in the hpx namespace, the executor doesn't matter
<Yorlik> Allright.
<Yorlik> Compiled and run - now trying to integrate it into my code an LuaSTate Handling
<hkaiser> Yorlik: does it make sense to you?
<Yorlik> Totally.
<Yorlik> I don't yet understand everything but I think I grasped the rough idea of it.
<Yorlik> You basically provided exactly what I need for my parloop.
<hkaiser> that was the idea
<Yorlik> And now I can really do a lot of cool things with my tasks.
<Yorlik> It's awesome - really - many many thanks!
<hkaiser> most welcome - its fun to do useful things ;-)
<diehlpk> First conference was postponed to next year for me
<Yorlik> hkaiser: It's totally useful.
<hkaiser> that's more like what I would expect
<diehlpk> hkaiser, Our course got CxC approved
<hkaiser> \o/
<hkaiser> great, btw Bijay never even responded to my email
<hkaiser> just forget about CS
<diehlpk> Same for the EE guy
<hkaiser> idiots
<diehlpk> At least it will be a math course
<hkaiser> nod, and we will advertise it at CS/EE and rub it in afterwards ;-)
<diehlpk> yes, they advertised GSoC for us
<diehlpk> They sent an email to all gradstudents
<hkaiser> well, at least something
<diehlpk> They will do the same for the course
<hkaiser> diehlpk: btw, Katie asked whether we would be ok if one of her own students applied
<hkaiser> I meant Kate Isaacs
<diehlpk> Sure, saw the email but got lost during the moving chaos
<diehlpk> he just should start to prepare the proposal soon.
<diehlpk> Deadline is end of this month
<hkaiser> diehlpk: I got the 2 pager from Geoff, btw - did you get that one too (last week already)
<diehlpk> No
<hkaiser> ok, let me forwrd it to you
<diehlpk> Thx
<diehlpk> hkaiser, I do not like it
<diehlpk> Still not clear what we want to do
<diehlpk> At least I do not understand what we will do in the next two years
<diehlpk> What is the scenario we will target?
<diehlpk> What kind of physic is needed to do this?
<diehlpk> Can octotiger run the tragted application
<diehlpk> Do we need to implement new physics
<diehlpk> What about v1309?
bita has joined #ste||ar
<hkaiser> exatly my questions
<Yorlik> hkaiser: Where would I add my custom chunk size in this particular parloop? In the executor?
<hkaiser> par.on(exec).with(chunk_size_policy_object)
<Yorlik> Allright. Thanks !
<Yorlik> hkaiser: hpx::parallel::execution::task is redundant and omitted, i guess?
<Yorlik> Like being the default executor?
shahrzad has joined #ste||ar
<hkaiser> no
<hkaiser> is you need async execution you write par(task).ont(...).with(...)
<hkaiser> these things are orthogonal
<Yorlik> IC. I'll add it
<hkaiser> Yorlik: par (and par(task)) are execution policies
<hkaiser> those have an associated executor and associated parameters (like chunk size)
<Yorlik> I still do not really understand the concepts of an execution policy and an executor
<Yorlik> You really need to write the hpx book :)
<hkaiser> .on() changes the associated executor while .with() changes the execution parameters
<Yorlik> I still don't understand the meaning of these concepts - I just have a foggy idea of what they do.
<hkaiser> both .on() and .with() return a new execution policy the algorithms can work with
<Yorlik> I need to do some serious studying / reading on this.
<hkaiser> Yorlik: an executor is an object that -- well -- executes tasks
<hkaiser> nothing else
<hkaiser> it knows how to execute things
<Yorlik> BTW it runs - I can now work on my empty lambdas and fill them woith meaning (lua state)
<hkaiser> execution parameters allow to customize the behavior and parameters of scheduling tasks
<hkaiser> (chunk size, etc.)
diehlpk has quit [Ping timeout: 260 seconds]
<Yorlik> Is the executor basically a wrapper for the task main function or sth?
<hkaiser> execution policy is what the standard defines for the algorithms (par, seq, un_seq, etc.) and are tag types that tell the algorithms how they behave
<hkaiser> executors are wrappers for the scheduler
<Yorlik> So - it's more a higher level management obvject?
<hkaiser> yah, very thin object referring to a scheduling system
<Yorlik> And - obviously - a customization point.
<hkaiser> and you can do other things like wrapping the scheduled functions and calling start/stop functions
<hkaiser> yes
<Yorlik> So the execution parameters .with etc are inside that thing and working with the scheduler wrapped by the excutor
<hkaiser> yes
<Yorlik> OK
<Yorlik> Less fog now
<Yorlik> Thanks !
bita has quit [Quit: Leaving]
<Yorlik> hkaiser: I think there is a problem because I put everything inside my static member function "update_entity_array_advanced". It compiles, but I get a runtime error: "this->exec_.on_start_.vptr was nullptr.
<Yorlik> " from basic_function.hpp. How should I fix this?
<hkaiser> no idea
<Yorlik> The lambdas and the executor are all defined inside the function
<hkaiser> the function was not initialized r went out of scope
<hkaiser> you have to keep the executor alive as long as there are threads running
<hkaiser> you cn try reomving the reference from the base executor member in the executor
<hkaiser> BaseExecutor& exec_; --> BaseExecutor exec_;
nk__ has quit [Ping timeout: 246 seconds]
<Yorlik> Or I make the executor a member of the class
<Yorlik> It still crashes. I'll keep it alive instead
shahrzad has quit [Ping timeout: 246 seconds]
shahrzad has joined #ste||ar
akheir1 has quit [Quit: Leaving]
hkaiser has quit [Quit: bye]
diehlpk has joined #ste||ar
diehlpk has quit [Changing host]
diehlpk has joined #ste||ar
diehlpk has quit [Ping timeout: 246 seconds]
shahrzad has quit [Remote host closed the connection]
shahrzad has joined #ste||ar
shahrzad has quit [Ping timeout: 260 seconds]
Abhishek09 has joined #ste||ar
Abhishek09 has quit [Remote host closed the connection]
Abhishek09 has joined #ste||ar
shahrzad has joined #ste||ar
shahrzad has quit [Client Quit]
Abhishek09 has quit [Remote host closed the connection]
Abhishek09 has joined #ste||ar
mdiers_ has quit [Remote host closed the connection]
mdiers_ has joined #ste||ar
nk__ has joined #ste||ar
<nk__> heller1, here you go: https://imgur.com/a/4fVk8fr ;)
<heller1> Nice!
<heller1> Now, explain the graph, highlight the different regions
<nk__> yesterday btw my main doubt was the use of peak bandwidth. And about passing through origin, most of the plots online didn't pass through origin so that confused me. Didn't realize they were log scaled :D
<nk__> peak bandwidth was supposed to be the stream triad benchmark that we took coz DRAM peak bandwidth is usually not achievable
<nk__> btw I couldn't find IPC for HiSilicon1616 anywhere
<nk__> heller1, explain as in explain you or color the regions for memory and compute bound?
nk__ is now known as nikunj97
<heller1> Well, let's assume I have an algorithm with an arithmetic intensity of 8. What can you tell me about it?
<nikunj97> i can say that it's a memory bound problem and you cannot achieve the peak cpu performance
<nikunj97> you can achieve I*Peak bw worth of performance at best
<heller1> I run my program and I get 8 gflops
<heller1> What now?
<nikunj97> sounds like you need better cache management
<heller1> If I introduce better cache management, what will change?
<nikunj97> your program's memory bandwidth will increase
<nikunj97> while the arithmetic intensity will be same
<nikunj97> so an increase in memory bandwidth will lead to a better performance
<heller1> Will I fetch more memory from dram or less?
<nikunj97> you will fetch less memory
<nikunj97> if you have it available in cache the bandwidth will be significantly higher
<nikunj97> since fetching it from DRAM takes way longer than fetching from cache
<heller1> Correct
<nikunj97> so if your code makes use of L1/L2/L3 cache better
<nikunj97> then you will fetch very little from DRAM
<heller1> And why does this not affect the arithmetic intensity?
<nikunj97> because you still do the same number of loads and stores and arithmetic operations
<heller1> You just said I'm fetching less memory per floo
<nikunj97> it's just that the loads and stores now are significantly faster
<heller1> Floo
<heller1> Flop
<heller1> Sure
<heller1> Which line in the graph will affect this?
<nikunj97> arithmetic intensity is number of floating point operations divided by the byte fetched
<nikunj97> number of floating point operations for a given equation is a constant
<nikunj97> and so are bytes fetched
<nikunj97> so arithmetic intensity won't change
<nikunj97> it can only be changed by utilizing a different "type"
<nikunj97> i.e. changing from float to double
<nikunj97> or employing simd
<nikunj97> because that changes bytes fetched
<nikunj97> it will affect the peak bandwidth line
<nikunj97> heller1, ohh wait
<nikunj97> it's flop to DRAM byte ration in arithmetic intensity
<nikunj97> of course it will change if we employ better cache practices
<heller1> ;)
<nikunj97> it will move to the right since we will now have less DRAM fetches
<heller1> tada
<nikunj97> and peak performance will increase itself
<heller1> so, having an arithmetic intensity of 8, and measured performance of 8 GFLOPS/s, what will you recommend me to improve in my code?
<nikunj97> since you have 8GFLOPS/s, you are above scalar peak. Which means you're using simdized kernel. I will recommend you to change the structure of your for loops such that it makes sure of less conflict misses
<heller1> (NB those are the questions I used to ask my students in the exams for 'Architecture of Supercomputers')
<heller1> not quite
<heller1> I am slightly above scalar peak, that's correct
<heller1> however, there's more instruction level parallelism that could could get me above that performance level. What you can observe, is that there's a gap between my performance results and what could have been achieved when utilizing vectorized instructions
<nikunj97> yes
<nikunj97> you can certainly achieve better simd results
<heller1> so I should look into vectorizing my code
<heller1> having better cache utilization won't affect my performance since I am not really close to any bandwidth limitation
<nikunj97> that sounds right too. A better vectorized code will certainly achieve better performance
<nikunj97> until I hit the memory bandwidth i.e.
<nikunj97> post which I should look into caching
<heller1> tada, now you got it ;)
<nikunj97> yes, I realize that now
<nikunj97> so now for our jacobi stencil
<heller1> keep in mind that this is still a very simplistic model, however, it is very effective
<heller1> you should always start with single core performance
<nikunj97> yes it looks very effective
<heller1> and try to get it to the maximum performance for your calculated arithmetic intensity
<heller1> then go multi threaded and see how well it scales
<nikunj97> for stencil, we have: next[x][y] = const * (curr[x][y+1] + curr[x][y-1] + curr[x-1][y] + curr[x+1][y])
<heller1> always use tools to analyze the given performance bottlenecks you identified using the roofline model
<nikunj97> for a normal cache implementation, we will require 4 loads and 1 store
<nikunj97> i.e. loading top and bottom neighbor and index of next stencil
<heller1> you can assume that the index is inside a register
<heller1> so I'd say 3 loads in the limit
<nikunj97> which index?
<heller1> x and y
<heller1> which index were you talking about?
<nikunj97> x and y
<nikunj97> how can we assume it to be inside the register already?
<heller1> since they are loop variables
<nikunj97> so 3 loads and 1 store?
<heller1> yes
<nikunj97> I see. If the cache is large enough to have 3 rows, we can get past with 1 load and 1 store
Abhishek09 has quit [Remote host closed the connection]
<heller1> yes, the problem however, is the distance between y+1, y and y-1
<nikunj97> yes
<nikunj97> for that 3*Nx*size of index should not be any larger than cachesize
<nikunj97> in an ideal case scenario, should not be larger than cachesize/2
<nikunj97> before all that, let me go ahead assuming 3 loads and 1 store
<heller1> those are given by the problem you want to solve, you need to massage your algorithm to account for that
<heller1> not change the problem size ;)
<nikunj97> yes, understood :)
<nikunj97> roofline makes the analysis easier
<nikunj97> also 1 Lattice site update involves 4 floating point operations
<nikunj97> so we can use that conversion to account for our performance in FLOPS/s or LUPS/s
<nikunj97> so right now we have 4 FLOP per 32 Byte (4 load/store * 8B per instruction)
<nikunj97> so that will be 1/8 for arithmetic intensity
<nikunj97> now I need to measure the performance of my application and plot that in graph. Following which I can see if I need better caching or better vectorized code. Right heller1?
<heller1> yes
Hashmi has joined #ste||ar
Abhishek09 has joined #ste||ar
nikunj97 has quit [Ping timeout: 246 seconds]
nikunj97 has joined #ste||ar
Abhishek09 has quit [Remote host closed the connection]
<nikunj97> heller1, running your stencil application in a single core setting. I see about the performance at 600 MPLUPS/s
<nikunj97> since the arithmetic intensity is 1/8 and the peak bandwidth is 39GB/s, we should get about 5GFLOPS/s worth of performance
<nikunj97> or about 1,250MLUPs/s
<heller1> what parameters did you run it with?
<nikunj97> ./stencil_parallel_1 -t1
<nikunj97> I let Nx and Ny be default at 1024
<nikunj97> and steps at 100
Abhishek09 has joined #ste||ar
<nikunj97> everthing is measured in doubles
<nikunj97> and not floats
<heller1> I think you need a larger problem size
<nikunj97> should I increase task grain size or the number of tasks?
<heller1> Larger dimensions, iirc
<nikunj97> in the paper you use 10,000 x 100,000
<nikunj97> let me try using them
<heller1> yes
<heller1> also, try the last parallel version
<heller1> stencil_parallel_4
<nikunj97> stencil_parallel_4 is the distributed one
<heller1> yes
<nikunj97> ok, let me run that one. stencil_parallel_1 reported 631MPLUPS
<nikunj97> is that an expected performance?
<nikunj97> I think simd should help with the performance in this case
<nikunj97> we're at half the peak performance for the arithmetic intensity. So in theory we can get about 2x speedup
<nikunj97> stencil_parallel_4 also reports ~630MLUPS (629.123 to be precise)
<nikunj97> using avx2 in haswell we can store 4 doubles. That should be enough to achieve the 2x speedup I guess
<nikunj97> once we get to 2x speedup, I will then need to look into ways to increase the arithmetic intensity by employing caching
<nikunj97> that looks like a good start to me. Do you concur?
<heller1> sounds good
<nikunj97> great \o/
<heller1> what does stencil_serial report?
<heller1> also, using avx2 will potentially give you a 4x speedup ;)
<nikunj97> I have not checked
<heller1> please do
<nikunj97> heller1, yes "potentially" 4x
<nikunj97> but given that we're already at 1/2 the peak performance, without using caching strategies, I don't think I'll be able to get the 4x speedup. So I'm aiming for that 2x speedup first
<heller1> first, try compiling with -ffast-math and -mavx2
<heller1> to see what this gives you
<nikunj97> ok
<nikunj97> stencil_serial reports 955MLUPS
<heller1> ;)
<heller1> so there's quite some overhead with the parallel for loop
<nikunj97> looks like it
<heller1> I hope you set CMAKE_BUILD_TYPE=Release?
<nikunj97> yes
<nikunj97> Debug can't give this high performance, can it ;)
<nikunj97> I made the build type mistake last year while benchmarking resiliency. Was pretty embarrassing when Hartmut told this to Keita
<heller1> :P
<nikunj97> the only way to hide parallel_for_loop's overhead would be to increase the grain size
<nikunj97> perhaps Nx=50,000 should hide the overheads
<heller1> try and see
<nikunj97> already started :)
nikunj97 has quit [Ping timeout: 260 seconds]
nikunj97 has joined #ste||ar
gonidelis has joined #ste||ar
<nikunj97> why does my application not link to -lboost_program_options when boost is definitely in the LD_LIBRARY_PATH :/
<nikunj97> module is loaded and cmake can see it, but g++ can't
<nikunj97> interesting... ldconfig -p does not show boost_program_options in there
<zao> nikunj97: LD_LIBRARY_PATH is for the loader. LIBRARY_PATH is for the linker.
<nikunj97> just realized, hpx doesn't need boost program options any more
<zao> Are you not using FindBoost and the targets/variables it produces for libraries?
<nikunj97> `echo $LIBRARY_PATH` doesn't return anything :/
<nikunj97> I'm essentially using g++ my_file.cpp -lboost_program_options
<nikunj97> LD_LIBRARY_PATH already has the boost path
<zao> CPATH and LIBRARY_PATH are the environment variables corresponding to the compiler's -I and the linker's -L.
<nikunj97> module list shows me that boost 1.72 is loaded
<nikunj97> ahh!
<zao> prepend_path("CPATH","/hp/eb/software/Boost/1.71.0-gompi-2019b/include")
<zao> prepend_path("LIBRARY_PATH","/hp/eb/software/Boost/1.71.0-gompi-2019b/lib")
<zao> prepend_path("LD_LIBRARY_PATH","/hp/eb/software/Boost/1.71.0-gompi-2019b/lib")
<zao> This is what my Boost/1.71.0 module sets, for example.
<nikunj97> why has it not been set in my case then :/
<nikunj97> I should ask Ali when he's around
<zao> Not all module systems set them.
<nikunj97> let me try this
<zao> There might be an environment variable for you to locate your Boost directory.
<zao> In my world, I'd say "-L${EBROOTBOOST}/lib" for example
<nikunj97> I see
<zao> If you're using Lmod, you can 'ml show' on your module name to see what it performs when loaded.
<nikunj97> `env | grep boost`, boost only assigns to LD_LIBRARY_PATH
<nikunj97> yes lmod is used
<nikunj97> ml boost doesn't prepend LIBRARY_PATH
<nikunj97> let me prepend that myself then
<nikunj97> zao, worked like a charm
baocvcv has joined #ste||ar
<nikunj97> heller1, I forgot that I was running them on head node :P
<nikunj97> on a haswell node, serial runs at about 630MLUPS
<nikunj97> and parallel runs at about parallel runs at about 510MLUPS
<zao> nikunj97: Running code on head nodes? *shudder*
<nikunj97> zao, that cluster is kind of exclusive to us ;)
<nikunj97> so there aren't many people accessing that one
<nikunj97> I tend to forget not to run things on head node :P
hkaiser has joined #ste||ar
gonidelis has quit [Ping timeout: 240 seconds]
baocvcv has quit [Ping timeout: 264 seconds]
Abhishek09 has quit [Remote host closed the connection]
Abhishek09 has joined #ste||ar
<nikunj97> hkaiser, how much of an overhead does parallel_for_loop have? I presume it's about 1-2us
<heller1> the results are good
<heller1> now on to vectorization ;)
<hkaiser> nikunj97: ~1 us per created thread
<hkaiser> amortized over the number of cores
<nikunj97> alright, so I need about 50-100us to hide them in noise
<hkaiser> yah, better 200us
Sarthakag has joined #ste||ar
<nikunj97> 200us for a stencil will require ample for loops ;)
Sarthakag has quit [Remote host closed the connection]
<nikunj97> heller1, yes I think so too!
<nikunj97> hkaiser, ample Nx for loops*
Sarthakag has joined #ste||ar
<hkaiser> nikunj97: why parallelize in the first place if you don't have 200us of work?
<nikunj97> not that I don't have work
<nikunj97> but the naive 2d stencil approach does 1 row at a time with neighboring top and bottom row
<nikunj97> and iterate over the row
<nikunj97> unless the row length, is large enough. I don't think I can get 200us of work in that task :/
<nikunj97> a for loop with 10,000 runs takes about 25-30us
<hkaiser> nikunj97: let me ask again: why parallelize a piece of work that is not large enough?
<hkaiser> if your rows are too short then parallelize over the rows and not each row on its own
<heller1> nikunj97: not every row is one task
<heller1> nikunj97: you have multiple rows in one task
<heller1> so?
<nikunj97> is it working on the whole stencil?
<nikunj97> I believe it's only working on a row
<heller1> yes, this function only works on one row
<heller1> but not every element function of the for loop corresponds to one task
<hkaiser> heller1: btw, with the latest changes to get<T>(variant), gcc fails again (https://cdash.cscs.ch//viewBuildError.php?buildid=100586) :/
<nikunj97> heller1, aah! understood. Give it more than one row to increase the work. Sensible enough
<heller1> hkaiser: why is it trying to call tuple_element on a variant to begin with?
<hkaiser> gcc's std::visit implementation relies on a (unqualified) get<I>(v)
<K-ballo> that sounds wrong
<hkaiser> sure it's wrong
<hkaiser> I think I know what's missing
<K-ballo> all get<I> in libstdc++ 9.1.0 look qualified
<hkaiser> we insert or get<I> into namespace std for the tuple compatibility, I need to do the same with the get<I>(variant) I added
<hkaiser> then our overloads will be found and used
<K-ballo> we add get<I> to std?? that's definitely wrong
<hkaiser> we do that to enable std::get<> for our tuples
<K-ballo> ok, that's wrong
<hkaiser> shrug
<K-ballo> that explains why visit is calling it
<hkaiser> right
<K-ballo> doesn't explain what calls visit in the first place
<hkaiser> our serialization of std::variant
<K-ballo> our gets, those we wrongly inject into std... they are unconstrained :|
<hkaiser> yah
<hkaiser> blame heller1 ;-)
<K-ballo> lol
<heller1> blame the reviewers :P
<K-ballo> no reviewer ever said "we'll regret this"?
<hkaiser> you did ;-)
<K-ballo> I'd expect it
<hkaiser> ok, I'll redo our gets
<K-ballo> so.. those gets need some constraining
<K-ballo> but at the same type, those gets need to be unconstrained.. mmh
<hkaiser> not really, we can create specializations for all types we want to support
<K-ballo> as long as it is for the std:: injected ones, and not for hpx::util ones
<hkaiser> we can add specializations for util::tuple as well
baocvcv has joined #ste||ar
<Sarthakag> Hi, I am Sarthak, a 4th year student studying at BITS, Pilani. I would like to contribute to STE||AR. Although I have been an active programmer in C++ since the last 4 years, I am a beginner in the field of HPC. How can I get some hands on experience?
rtohid has joined #ste||ar
Hashmi has quit [Quit: Connection closed for inactivity]
khuck has joined #ste||ar
khuck_ has joined #ste||ar
khuck has quit [Ping timeout: 260 seconds]
diehlpk has joined #ste||ar
diehlpk has quit [Changing host]
diehlpk has joined #ste||ar
khuck_ has quit []
khuck has joined #ste||ar
bita has joined #ste||ar
Hashmi has joined #ste||ar
<Sarthakag> Is this the correct platform to ask such doubts?
<zao> Hi there!
<zao> Is your interest as a general researcher or as part of Google Summer of Code?
<Sarthakag> Google Summer of Code
<nikunj97> heller1, btw what's the use of hpx::compute::host::block_allocator?
<heller1> nikunj97: it distributes the allocated memory across all numa domains
<nikunj97> why did you have to use hpx::compute::vector over a std::vector? You can give the allocator to std::vector as well
weilewei has joined #ste||ar
<heller1> yes, compute::vector uses an extension of the allocator which does parallel construction of the elements
<heller1> exploiting the first touch numa policy
<nikunj97> aah! so in hindsight element construction within the vector occurs in parallel using numa
<nikunj97> yes
<heller1> this is not as easy with a std::vector
Sarthakag has left #ste||ar [#ste||ar]
Sarthakag has joined #ste||ar
<hkaiser> K-ballo: so I think we need a constrained version of get to util::tuple which is injected int std:: and hpx::util:: and an unconstrained version of get that is injected only into hpx::util::, the latter for the types we want to support from std::
<K-ballo> possibly, yeah, the two will need different constrainings
<hkaiser> yes
<K-ballo> the util one could be constrained on non-empty util::tuple_element<decay<Tuple>>>
<K-ballo> the std:: one will have to be a lot more conservative
<hkaiser> just for util::tuple
nan1 has joined #ste||ar
<Abhishek09> nikunj97 Where does hpx installed by dnf package . location?
<nikunj97> Abhishek09, /usr/
<nikunj97> libraries at /usr/lib64
<nikunj97> headers at /usr/include
<Abhishek09> Thanks i got it
akheir has joined #ste||ar
<Abhishek09> nikunj97 : how to install deps of pylanx because i dont have hpxconfig.cmake file?
<hkaiser> Abhishek09: dnf the hpx-devl package
<nikunj97> install hpx-devel :)
<hkaiser> hpx-devel
<zao> Abhishek09: It's customary to split distribution packages into two parts, one with binaries and libraries for running software, and one with files only for development like headers and build system metadata like pkg-config files or CMake exports.
<Abhishek09> What is hpx-devel ? nikunj97 . How it is differennt from hpx?
<nikunj97> Abhishek09, zao just explained that
<nikunj97> hpx-devel essentially contains files relating to build system
<nikunj97> example pkg-config and cmakefiles
<nikunj97> one of which is hpxconfig.cmake
<nikunj97> the hpx package ONLY installs the headers and libraries
<zao> nikunj97: Not even headers.
<hkaiser> nikunj: dnf hpx should only install the binaries etc. the headers should not be needed for it, actually
<nikunj97> ohh, my bad. I thought headers were also installed.
<hkaiser> K-ballo: wouldn't the currently unconstrained implementation of get be sfinaed out if tuple_element is not defined?
<K-ballo> tuple_element is not supposed to be sfinae-friendly
weilewei has quit [Remote host closed the connection]
<hkaiser> I mean the get implementation
<K-ballo> it may work for non cv-qualified non tuples? it's not supposed to
<K-ballo> tuple_element is not supposed to be sfinae-friendly, so get shouldn't sfinae out on it.. it'd need it's own sfinae
<hkaiser> ok
Sarthakag has quit [Ping timeout: 240 seconds]
Abhishek09 has quit [Ping timeout: 240 seconds]
Abhishek09 has joined #ste||ar
diehlpk_work has joined #ste||ar
nan has joined #ste||ar
nan is now known as Guest58090
Guest58090 has quit [Remote host closed the connection]
nan2 has joined #ste||ar
khuck has quit [Remote host closed the connection]
<Yorlik> hkaiser: How is the parloop executed in parallel if I'm not using "hpx::parallel::execution::task " ?
Abhishek09 has quit [Ping timeout: 240 seconds]
<Yorlik> Still tasks, but waiting in the local thread for them to finish?
baocvcv has quit [Ping timeout: 250 seconds]
<Yorlik> NVM - I think I got it - it's about the parloop as a whole, not the chunks - otherwise it wouldn't make sense anyways.
Abhishek09 has joined #ste||ar
khuck has joined #ste||ar
<hkaiser> Yorlik: par does parallelization, hense the name
<hkaiser> Yorlik: par(task) does parallelization, but also executes the algorithm asynchronously (makes it return a future)
<Yorlik> I think I confused launching the entire loop call async or just the chunks being async
<Yorlik> I moved my futures up a level in the call stack and thus couzld keep the executor and stuff around
<Yorlik> Still fighting with soring the Lua State for the Thread in a static map - it needs mutexes which is ugly as hell.
<Abhishek09> nikunj97: after dnf hpx-devel where did find hpxconfig.cmake file?
<nikunj97> Abhishek09, /usr/lib64/cmake/HPX
<Abhishek09> nikunj97 Yes , it is there
<nikunj97> you can use that while installing phylanx
<Abhishek09> nikunj97 : ste||ar has two dnf package `hpx` & `hpx-devel` Am i right?
<Abhishek09> zao ^?
<zao> Yup.
<zao> You can see the contents of installed packages if you want with for example "rpm -qvl hpx"
<Abhishek09> @zao Does dnf has blaze ppackge?
<hkaiser> Yorlik: use a thread_local container
<Yorlik> hkaiser: But if the task migrates? I need to give back the LuaStates when the task is done and can't access that task specific slot then.
<Abhishek09> Does blaze-builder have role in building anything diehlpk
<hkaiser> no problem, yu're not freeing things but caching the pointers for the next use - so they can 'go back' to a different pool
<zao> Abhishek09: Ask dnf :)
<diehlpk_work> Abhishek09, What is blaze builder?
<zao> Some names are conflicting between different software packages. It could be a disjoint software.
<Yorlik> hkaiser: That's what I already did before, using 4 pools, one per worker thread. But I just detected another issue which might be responsible for the LuaState explosion. Need to investigate ...
<zao> You're not going to find all the dependencies in the distro repos.
<Abhishek09> maintained by diehlpk_work
<Yorlik> hkaiser: Should I use HPX specific mutexes if I need to lock? Or is std::mutex okay?
<Abhishek09> zao but not working
<zao> Yorlik: An OS primitive will block the thread. A HPX primitive will yield for other work.
<diehlpk_work> Abhishek09, Do you speak about my fedora package for blaze?
<hkaiser> Yorlik: yes, you're running on a hpx thread
<Abhishek09> Yes diehlpk_work
<Yorlik> Which mutex should I use?
<Abhishek09> it works or not ?
<Abhishek09> diehlpk_work
<zao> If you don't use HPX primitives, you're likely to halt progress if you're relying on something HPX:y to unblock you.
<Yorlik> Just for standard lock_guards
<diehlpk_work> Abhishek09, I do not udnerstand what you will do with the fedora spec file
<Yorlik> The waiting times are extremely low, but I have a weird feeling the state explosion has to do with mutual locking and might run in a deadlock even
<Abhishek09> I will install blaze diehlpk_work
<diehlpk_work> It is very specific to build a package for Fedora
Hashmi has quit [Quit: Connection closed for inactivity]
Abhishek09 has quit [Remote host closed the connection]
Abhishek09 has joined #ste||ar
<Abhishek09> diehlpk : dnf install blaze works or not?
<zao> Abhishek09: Did you _try_?
<zao> If it's in the repositories, it's supposed to work.
<Abhishek09> zao: i tried blaze but not found
<Abhishek09> dnf install blaze
<zao> nikunj97: Now I feel silly... I've been ssh:ing to my old laptop running Fedora every time I needed to test something here, and it takes ages to get to a prompt.
<zao> Turns out that IPv6 is down on it and it was busy timing out before trying IPv4 :D
<nikunj97> lmao
<zao> Abhishek09: Which OS release are you on now again?
<nikunj97> zao, is your laptop that old?
<nikunj97> I don't think IPv6 is disable in recent models when you install any os
<Abhishek09> zao fedora
<zao> It's an old EeePC I repurposed to a small home server, so it's very low on RAM.
<nikunj97> Abhishek09, blaze-devel exists
akheir has quit [Read error: Connection reset by peer]
<zao> Abhishek09: I'd like to know which version.
* nikunj97 zao trying to build hpx on low RAM
<Abhishek09> blaze does not exist
<zao> Abhishek09: And yes, recall again how there's typically two packages for software, one regular for running things and one -devel for building things.
<zao> Blaze is header-only and as such, doesn't have the former.
akheir has joined #ste||ar
<Yorlik> Is mutex from <hpx/synchronization/mutex.hpp> fully compatible with std::mutex and can it be used with std::lock_guard?
<zao> Yorlik: Pretty much anything can be BasicLockable.
<Yorlik> zao: IC - thanks!
<zao> nikunj97: It'd be foolish to build HPX on 1GB of RAM.
<nikunj97> zao, I have 16gb ram. It feels low while building hpx :/
<zao> For amusing reasons the machine has public IPv4 over wired and LAN+IPv6 over wifi. Wifi's down apparently.
<nikunj97> aah
<hkaiser> Yorlik: use hpx::lcos::local::spinlock
<Yorlik> Does that yield ?
<hkaiser> yes, eventually
<Yorlik> Nice.
<Yorlik> So it does a compromise?
<hkaiser> I still think you can get away with a thread_local pool of lua engines
<Yorlik> The pool works without a single lock
diehlpk has quit [Ping timeout: 260 seconds]
<Yorlik> Problem is sending messages and the mailboxes
Abhishek09 has quit [Ping timeout: 240 seconds]
Abhishek09 has joined #ste||ar
<hkaiser> Yorlik: ok, that's your problem, then
<Yorlik> I need the locks for message push_back and for mailbox take ownership
<hkaiser> nod, spinlock is the way to go
akheir has quit [Read error: Connection reset by peer]
<Yorlik> Thats one of the few points where I cannot avoid data sharing
<Yorlik> OK - thanks!
<hkaiser> if there is no contention it will be a single atomic op, otherwise it will spin for a while before yielding
<Abhishek09> zao: i didnt remember version simply run `docker run it fedora`
akheir has joined #ste||ar
parsa[m] has quit [Ping timeout: 240 seconds]
freifrau_von_ble has quit [Ping timeout: 240 seconds]
gdaiss[m] has quit [Ping timeout: 256 seconds]
simbergm has quit [Ping timeout: 256 seconds]
heller1 has quit [Ping timeout: 256 seconds]
<Abhishek09> zao: is there any dnf package for blazetensor - devel?
gdaiss[m] has joined #ste||ar
<zao> Abhishek09: As I said, not all dependencies have distro packages.
akheir1 has joined #ste||ar
<zao> blaze_tensor doesn't even have _releases_.
heller1 has joined #ste||ar
<zao> In this phase, you're going to be able to leverage distro packages for some of the bigger base dependencies, but definitely not all.
akheir has quit [Ping timeout: 256 seconds]
<Abhishek09> zao that means i have to cmake for blaze_tensor . now i will try `dnf pybind11-devel`
simbergm has joined #ste||ar
weilewei has joined #ste||ar
parsa[m] has joined #ste||ar
parsa[m] is now known as Guest295
<weilewei> a navie question about build and install software on Linux: if I don't issue `make` instruction, and just directly issue `make install`, will this instruction build software for me?
<nikunj97> yes
<nikunj97> make install is usually dependent on make
rtohid has quit [Remote host closed the connection]
Abhishek09 has quit [Ping timeout: 240 seconds]
<weilewei> ok, so it doesn't matter if I skip `make` or not if my intention is to install software
<weilewei> nikunj97 thanks!
<nikunj97> that's why when you run make install in hpx, it still builds it for you. If you have already done make, it still checks
freifrau_von_ble has joined #ste||ar
<nikunj97> yup
<nikunj97> make install will work
<weilewei> nice! nikunj97
<Yorlik> hkaiser: Is max_busy_loop_count concerning the spinlocks? After replacing all mutexes I had a ton of lua engines created, I think it's SendMessage waiting for access to a mailbox and backing off, bringing in the next possibly locked SendMessage. I need to slow down the backoff from SendMessage specifically - I'd not like to do this globally - However - Ideas?
<hkaiser> no, that's unrelated
<hkaiser> spinlock will backoff very quickly, don't think that's your issue
<Yorlik> I guess with my excessive message test I am creating a gazillion tasks in flight
<Yorlik> But I might have a logical error elsewhere ofc.
Abhishek09 has joined #ste||ar
<Abhishek09> nikunj97: Where does pybind11 cmake file reside by installing them by dnf ?
<nikunj97> never installed pybind with dnf
<nikunj97> should be somewhere in /usr/lib/cmake or /usr/share/cmake
nan2 has quit [Ping timeout: 240 seconds]
nan1 has quit [Ping timeout: 240 seconds]
Abhishek09 has quit [Remote host closed the connection]
gonidelis has joined #ste||ar
<gonidelis> Seeking advice: As I am interested both in your 'Implement missing Parallel Algorithms ' and 'Range based Parallel Algorithms' projects do you believe that there is a way to combine them in somehow or should I just form two separate proposals.
<hkaiser> gonidelis: not sure
<hkaiser> gonidelis: both have sufficient work for a year ;-)
<hkaiser> I'd write a solid but more conservative proposal
<hkaiser> if you're done early you can always do more afterwards
<zao> hkaiser: That Sarthakag lad that came around a few hours ago, they were interested in one of your GSoC projects btw. I guess they'll either be mailing you or poke you on IRC some day.
<gonidelis> Great! Thank you sir
<hkaiser> zao: thanks
<hkaiser> gonidelis: in general, I think it's more important to have a proposal that shows you have thought it through than to have one that lists a lot of work that is not realistic to finish
<gonidelis> hkaiser Sure, I totally agree. HPX is all I have done the past three weeks so I believe I will be able to provide you with a solid solution.
<hkaiser> good, looking forward to reading your proposal, feel free to share it before submission
<gonidelis> Yeah, your review will be vital actually
<nikunj97> heller1, what's your github handle?
<diehlpk_work> hkaiser, SC workshop was rejected
<diehlpk_work> As usual for HPX
<hkaiser> darn
<hkaiser> don't worry about it, SC will not happen anyways this year
<diehlpk_work> Yeah, but we have inclusion again ;)
<hkaiser> nikunj97: it's sithhell
<nikunj97> heller1, I completed porting your 2d stencil into my version. It's on node as of now, but I will have a distributed version by this weekend. Initial results look promising as well. SIMDized kernel version works much faster this time.
<nikunj97> hkaiser, thanks!
<hkaiser> diehlpk_work: what 'inclusion'?
<diehlpk_work> Diversity and inclusion are really important to the SC community. Diversity is not just gender and organizational diversity, but should also include a plan to recruit ethnic minorities for attendees and participants/organizers in the workshop.
<nikunj97> diehlpk_work, what do you mean rejected coz of HPX?
<hkaiser> what a BS
<diehlpk_work> nikunj97, All things containing HPX were rejected from SC workshops, BOF, and panels
<nikunj97> ehh, why would they do this?
Abhishek09 has joined #ste||ar
<hkaiser> nikunj97: just because
<hkaiser> we're perceived as a threat nowadays
<nikunj97> that has a nice ring to it ;)
<hkaiser> heh
<hkaiser> it's just annoying
<Yorlik> You're kicking everyones back with HPX - lol.
<Yorlik> There is nothing comparable on the market afaik.
<Abhishek09> nikunj97 Did we want to want to install hpx and other deps of phylanx or just uses important files needed to install phylanx?
<hkaiser> Yorlik: thanks for the encouragement
<gonidelis> Who are these guys that sabodage HPX? Why do they do that?
<nikunj97> hkaiser, well it feels nice to be a part of a group who're feared in the community is what I meant ;)
<nikunj97> not the rejection part though :/
<hkaiser> Abhishek09: in the end you want to have a functioning Phylanx, that's it
<hkaiser> nikunj97: sure
<zao> gonidelis: Academia is a brutal place :D
<gonidelis> But why....
<zao> Some people have strong investment or feelings about how parallel software and communication should be done.
<Abhishek09> hkaiser nikunj97 that means no matter whetther deps work or not. Am i right?
<hkaiser> gonidelis: the usual... money, reputation, perceived status, you name it
<hkaiser> Abhishek09: if you can make Phylanx work with making its dependencies work - sure
<hkaiser> I however doubt that this will be viable
<Yorlik> hkaiser: I've done a lot of reading and testing before I decided to use HPX. We considered writing our own task system. Fortunately it wasn't necessary - HPX gave us all we need. My only problem is learning it and C++. But I can't complain, seriously not.
<hkaiser> :D
<gonidelis> Do you have any anatagonists?
<hkaiser> Abhishek09: I meant *without* making dependcies work
<hkaiser> gonidelis: the whole community ;-)
<Abhishek09> hkaiser Please elaborate!
<nikunj97> I feel the same. HPX was the first thing I learnt and now when I try writing code with other libraries, I only feel claustrophobic
<hkaiser> Abhishek09: I doubt you can make Phylanx work if its dependencies don't
nan22 has joined #ste||ar
<zao> There's people doing HPC traditionally, and there's this gang.
<nikunj97> HPX gives a lot of independence with what you can do
<zao> Futurization is the future :P
<hkaiser> there you go!
<Yorlik> MPI sucks HPXs hairy balls ....
<hkaiser> now now
<hkaiser> Yorlik: we have ids on this channel after all
<hkaiser> *kids*
<Yorlik> Woops. Couldn't help it. Sorry.
<nikunj97> :D
<zao> I'll have to revise my motto to "HPX - the best library that at least one person showed can be used".
<hkaiser> zao: it's John's motto, isn't it
<Yorlik> Hkaiser: You need to write the book. For real :)
<hkaiser> no way
<nikunj97> heller1, I've added you to my repository of simdized kernels so that I can show you code whenever I have performance related doubts. Please accept the request :)
<diehlpk_work> HPX books :)
<hkaiser> Yorlik: have you seen heller1's thesis?
<Abhishek09> hkaiser : i only carry that files of dependecies which are necessary for installation of phylanx For eg- hpx-config.cmake
<hkaiser> it's as close to a HPX book as you can get
<Yorlik> Yes. Especially the benchmarks :D
<nikunj97> I would appreciate an HPX book
<hkaiser> nikunj97: read heller1's thesis
<nikunj97> hkaiser, is heller1's thesis public?
<nikunj97> I'll surely give it a read
<hkaiser> sure
<Abhishek09> hkaiser?
<hkaiser> nikunj97: can't find it howver right now - ask him
<nikunj97> sure, will do
<hkaiser> we need to link it from our publications page
<Yorlik> +1^^
<nikunj97> weilewei, thanks!
<Abhishek09> nikunj97:i only carry that files of dependecies(shared lib) which are necessary for installation of phylanx For eg- hpx-config.cmake Am i right ?
<Abhishek09> nikunj97^
<nikunj97> you don't need to copy hpx-config to make phylanx work
<nikunj97> simply linking phylanx with hpx should be fine
karame78 has joined #ste||ar
<Abhishek09> No, i m telling no need to build hpx binaries , only hpx built files will be fine nikunj97
<Abhishek09> such as config.cmake files, then we link with shared libs with pip tools
<Abhishek09> setuptools
<nikunj97> config.cmake tells you where to find libraries
<nikunj97> they don't do linking etc.
<zao> Abhishek09: You've got two different sets of things you need. One wider set to build Phylanx, and a narrower set to use Phylanx.
Hashmi has joined #ste||ar
rtohid has joined #ste||ar
gonidelis has quit [Remote host closed the connection]
weilewei has quit [Remote host closed the connection]
gonidelis has joined #ste||ar
rtohid has quit [Remote host closed the connection]
Abhishek09 has quit [Remote host closed the connection]
<Yorlik> It's been on my harddrive for months already - and been read :)
<hkaiser> linking it here for posterity
<Yorlik> IC
<Yorlik> How can I solve these dreaded HPX/Bosst issues with winsock.h?
<hkaiser> include hpx first
<Yorlik> Since I replaced mutex with spinlock it came up in one component
<Yorlik> If I include HPX first boost asio complains
<hkaiser> about what?
<Yorlik> Is asio incompatible?
<hkaiser> no
<Yorlik> C:\__A\Arc_Sb\_INSTALL\boost\boost-1.72.0\RelWithDebInfo\include\boost-1_72\boost\asio\detail\socket_types.hpp(24): fatal error C1189: #error: WinSock.h has already been included
<hkaiser> include hpx before windows.h
<Yorlik> hpx.hpp? or can it be smaller?
<hkaiser> hpx/config.hpp should do the trick (I think)
rod has joined #ste||ar
<Yorlik> OK - that was an include mess from months ago I had to fix here. :)
<diehlpk_work> Anyone with good linux skills around?
<diehlpk_work> akheir1, yet?
<nikunj97> zao ^^
<nikunj97> diehlpk_work, anything I can help with?
<diehlpk_work> find -maxdepth 1 -name "lecture1.tex" -exec latexmk -pdflatex="lualatex --shell-escape %O %S" --jobname=$(basename "{}" .tex)-slides -pdf "{}" ";"
<diehlpk_work> The part with latexmk does work if I use the file name directly
<diehlpk_work> find -maxdepth 1 -name "lecture1.tex" -exec latexmk -pdflatex="lualatex --shell-escape %O %S" --jobname=lecture1-slides -pdf "{}" ";"
<diehlpk_work> Howeber, if I use the basename and the result of find twice, it does not work
<nikunj97> diehlpk_work, no clue :/
nan22 has quit [Ping timeout: 240 seconds]
nan11 has joined #ste||ar
<nikunj97> Yorlik, you're making a game engine right?
<Yorlik> A Distributed Gameserver, yes
<nikunj97> nice! could you tell more about it?
<Yorlik> Sure. What do you want to know?
<Yorlik> We could do it in voice if you like - less typing.
<nikunj97> so you'll be writing a mulitiplayer game post this?
<Yorlik> That's the plan.
<nikunj97> woah! that's really cool
<Yorlik> Server authoritative with Lua Scripting.
<Yorlik> We use Sol for the Lua bindings.
<nikunj97> are you from a AAA production house?
<Yorlik> Concurrency management is done with the actor model.
<Yorlik> Nope - simple hobbyist.
<Yorlik> We were modding a game that went south and decided to leave it and start over with a full game.
<nikunj97> i see
<nikunj97> when can I expect the game? xD
<Yorlik> lol
<Yorlik> Yesterday?
<nikunj97> lol
<Yorlik> I'm close to reach internal milestone 1: A scriptable local simulation.
<Yorlik> Next steps are distributed, object migration and client connection.
<nikunj97> nice! how does a multiplayer game work essentially
<nikunj97> ?
<Yorlik> Uh ...
<Yorlik> People connect - the server handles it - they play?
<nikunj97> lol
<nikunj97> I meant what exactly happens at the server side
<Yorlik> After all a player is just an object controlled from the outside.
<Yorlik> When the connection drops we could just leave it in game
<Yorlik> Or log it out
<nikunj97> is the whole hosted on the server?
<Yorlik> The player is just a replacement for AI
<nikunj97> true
<Yorlik> The client is just like a dumb graphical terminal with a bit of intelligence to hide away latency.
<Yorlik> Any maybe some physics for visuals, but we will have a server side authoritative physics
<nikunj97> so the whole simulations occurs on a server?
<Yorlik> Starting with collision ofc.
<Yorlik> Yes
<Yorlik> The server is the world.
<nikunj97> I used to think that you simulate the world in the machine
<Yorlik> And inside this world are objects (HPC Components)
<nikunj97> and send over the daya
<nikunj97> *data
<Yorlik> Yes, that's the gist of it.
<nikunj97> nice
<nikunj97> I'd like to know someday how all of it is put to place
<nikunj97> this is pretty interesting stuff
<Yorlik> The world will be tiled and tiesl will be able to migrate together with the objects and connections they contain.
<Yorlik> You start small.
<Yorlik> My first goal was to get a basic simulation going and I'm close to that
<nikunj97> nice!
<Yorlik> We started about Nov. 18
<Yorlik> 2018
<Yorlik> So its a good year now
<Yorlik> We did a LOT of research at start, because we understood bad decisions will pay back harshly.
<nikunj97> yes, it has been long
<Yorlik> We have been modding 4 years before that
<nikunj97> modding games?
<Yorlik> So - the entir project is like 5 years now, with a major setback a good year ago
<Yorlik> We modded a game that died
<Yorlik> The company screwed it up horribly.
<nikunj97> damn, that's sad :/
<Yorlik> Yes it is.
<nikunj97> which gamehouse do you work for?
<Yorlik> We will be able to reuse all the design ideas and a bunch of Lua scripts though.
<nikunj97> is there any game I can play?
<Yorlik> I'm a hobbyist as I said
<Yorlik> Nothing playable yet.
<nikunj97> so there was a silver lining in all of this
<Yorlik> In the moment it's all about getting a usable simulation and scripting environment
<Yorlik> The silver lining is in my beard ;)
<nikunj97> :P
<Yorlik> It's fun. Can't wait to have this frst miulestone done.
<Yorlik> I want to make a simple grass-rabbit-fox population dynamics sim as a test game.
<nikunj97> does sound like fun
<Yorlik> The learning curve was and still is painful. But it's rewarding too.
<Yorlik> Once you have a graphical client it's a balst, since you have this instant feedback visually.
<nikunj97> true
<Yorlik> In the moment it's very abstract and not very tangible.
<Yorlik> Still a lot of fun.
<hkaiser> Yorlik: you'll have to make your game moddable, like Skyrim etc. ;-)
<nikunj97> hkaiser, you've played Skyrim?
<Yorlik> That is a major problem actually.
<hkaiser> success guaranteed!
<hkaiser> nikunj97: who has not?
<Yorlik> Skyrim is single player.
<Yorlik> If we want true moddability we would have to hand out the server.
<hkaiser> yah, distributed modding is a problem, I understand
<nikunj97> hkaiser, woah didn't know you played games
<Yorlik> I'd like to do at least things like UI modding and such
<hkaiser> nikunj97: I do nothing else ;-)
<nikunj97> I've only recently started playing Skyrim
<Yorlik> Programming = A fom of gaming.
<zao> Progaming? :P
<Yorlik> Procrastigaming.
<Yorlik> Just call is Progging :D
<Yorlik> You can read that any way.
nikunj97 has quit [Quit: Leaving]
nikunj97 has joined #ste||ar
<Yorlik> hkaiser: Is there a way to limit how many tasks are allowed to be in flight at once?
<Yorlik> I have the impression I am coming inbto a spiral of death, where more and more tasks get created, which need a lua state, but because of the creation of states they get swapped out and are not being worked on and instead more states for new tasks are created.
<Yorlik> Sometimes it is in an equilibrium and then, suddenly it starts creating Lua States like crazy.
<Yorlik> I'm pretty sure it is some sort of spiral of death, I just don't understand yet exactly how.
<Yorlik> I want to try pre-creating more states , so I can hzand them out more quick, still - I should never enter this spiral.
<jbjnr> Yorlik: limiting_executor might help you, but it needs some improvments. not got around to doing them
<Yorlik> jbjnr: What does a limiting executor exactly do?
rod has quit [Remote host closed the connection]
<Yorlik> Hmm ... #3734 - IC
<Yorlik> hkaiser - I'm currently looking at https://isocpp.org/files/papers/p0443r1.html - Is this the actual version of the propoal?
<Yorlik> It's the executor proposal
nan2 has joined #ste||ar
maxwellr96 has joined #ste||ar
diehlpk has joined #ste||ar
diehlpk has quit [Changing host]
diehlpk has joined #ste||ar
<heller1> The latest version is 11
<hkaiser> Yorlik: use wg21.link/p0443
<Yorlik> AH OK - Just r11
<hkaiser> that gives you the latest version
<Yorlik> I was thinking a lot about this idea of an executor as far as I understand it.
<Yorlik> (Very limited)
<heller1> 13 even...
<Yorlik> I am missing one thing, though it seems to be implicitely contained somehow.
<heller1> What you'll read in the latest version of the proposal isn't implemented in hpx yet
<Yorlik> And that is a means to describe the workload.
<heller1> What's missing?
<heller1> As in?
<Yorlik> The executor if I understand correctly describes somehow the ways of execution, and manages the execution resources.
<Yorlik> But workloads have different properties.
<Yorlik> Some of them are reflected in the executors I think.
<Yorlik> Like the trivial: Parallelizable (somehow)
<Yorlik> Or Must execute sequentially
<Yorlik> But there are gaziilions of properties and construct you could choose to decribe abstract properties of the problem.
<heller1> Like memory access, need for compute resources and the like?
<Yorlik> E.g. in my case it is a frame based ececution in timesteps.
<heller1> Ahhh
<Yorlik> I think there's a ton of stuff worth describing
<Yorlik> If the executor is the worker, the "workload" would be the work.
<Yorlik> Understanding the work is one of the most important tasks of a programmer s a problem solver.
<heller1> Yeah
<Yorlik> So - I ask myself - Do we have siffuctiontly good structures to describe a problem/workload on an abstract level?
<heller1> But those are different properties
<Yorlik> Do we have appropriate language?
<heller1> Your real time requirements need to be honored by the scheduler
<heller1> No, we don't have that yet
<Yorlik> That's basically my thought - acounterpart to the executor consept.
<Yorlik> The "worlkload" concept
<heller1> We're not even close to having consensus on the basic vocabulary
<Yorlik> And a workload must be met by an appropriate executor to be worked on in a satisfying way.
<Yorlik> And maybe it's not even possible, but I think it's worth trying.
<hkaiser> Yorlik: the problem with standards papers is that the later the revision you read the more standardeeze you they get and th eless concepts are described
<Yorlik> E.g. when I was designing my datastructures with these composable entities - I was stumbling over a ton of abstract properties which I could use to describe my specific workload.
<heller1> There are different dimensions to that problem
<hkaiser> it's a good idea always to read all revisions as that gives you the whole background
<Yorlik> I'll not go too deep, since it's a rabbithole for sure.
<Yorlik> But the problem in itself is intriguing: How can we speak about and then program problems?
<Yorlik> resp. solutions to problems.
<Yorlik> However - I'm procrastinating. My LuaState explosion awaits me and wants a solution - lol.
Hashmi has quit [Quit: Connection closed for inactivity]
nan2 has quit [Remote host closed the connection]
<heller1> If anyone is interested in how hpx is used elsewhere
<heller1> At least under the covers
<zao> Heh, Erwin.
<heller1> ;)
<heller1> Isn't he your boss?
<zao> Different site.
<hkaiser> heller1: can you add that to the publication page?
<heller1> At, thought he was some kind of director of your collaboration
<heller1> Yes, please remind me again tomorrow
<hkaiser> I just did
<hkaiser> it's tomorrow for you already ;-)
<heller1> I guess we should add all allscale papers
<heller1> Nope
<hkaiser> almost
<heller1> Still two hours to go
<zao> There's six sites under SNIC, he's head honcho at one of them, PDC (KTH)
<heller1> An interesting person, learned quite a bit from him...
<hkaiser> heller1: if you send Katie an email she will be able to do that as well
<heller1> Ok
nan11 has quit [Remote host closed the connection]
<nikunj97> It feels so nice when you figure out the right template specializations \o/
nan22 has joined #ste||ar
diehlpk has quit [Ping timeout: 240 seconds]
shahrzad has joined #ste||ar
shahrzad has quit [Ping timeout: 256 seconds]
<nikunj97> heller1, yt?
<nikunj97> my code is running faster than yours (in terms of time), but the reported MLUPS is less
<nikunj97> for the same Nx, Ny and steps
<nikunj97> could you check if my formula is correct?
<nikunj97> my bad, I forgot to multiply the damn thing by number of steps
<nikunj97> :P
<nikunj97> hkaiser, yt?
nan22 has quit [Ping timeout: 240 seconds]
<hkaiser> nikunj97: here
<nikunj97> mutliple rows per task has improved heller1's code by quite a bit ;)
<nikunj97> I'm getting about 1.5x speed up
<hkaiser> nice
<hkaiser> with or without vectorization?
<nikunj97> with simd employed at about 4x speed up
<hkaiser> 8 doubles wide?
<nikunj97> well it's 3.7x really
<nikunj97> no 4 double wide
<nikunj97> wait no
<nikunj97> 8 double wide
<nikunj97> I'm using floats
<hkaiser> ok, 4 doubles wide simd gives you a speedup of ~2.5, not bad
<nikunj97> no, 4 double wide by 8 floats. my bad
<nikunj97> using AVX2
<nikunj97> I haven't tried -ffast-math that heller1 asked
<hkaiser> nice work
<nikunj97> I will ask Rohit to benchmark the application for me and then I'll report you the numbers
<nikunj97> I feel that I can try to get a paper out of this, if the numbers compare to the optimized openmp codes
<hkaiser> absolutely
<nikunj97> or if it's somewhere near the expected peak
<hkaiser> nikunj97: try to get a dependency of speedup over grainsize
<nikunj97> *theoretical
<nikunj97> yes, that's the first aim. Find the right grain size for the fixed matrix size of 10,000 x 100,000 (the same grid size taken by heller1 )
<hkaiser> more of finding the right grainsize for parallelization, then the matrix size should not matter too much (within reason
<nikunj97> ohh yea, right
<hkaiser> akheir1: tell shahrzad that nikunj97is doing benchmarking for 2d stencils, that might interest her
<nikunj97> hkaiser, you want to look at the code?
<nikunj97> it's still on node, but I will be able to write the distributed one in a couple of days
<hkaiser> yes, I would like to, just have no time, currently - sorry
<nikunj97> that's fine.
<nikunj97> hkaiser, btw did you get time to go through the doc I sent in that email?
<hkaiser> uhh, what email?
<nikunj97> I sent you one a couple of days back
<nikunj97> asking your help in a guide to hpc that I'm preparing
<nikunj97> for my univ
<hkaiser> ahh
<hkaiser> the google docs
<nikunj97> yes
<hkaiser> not yet :/
<hkaiser> will do right away
bita has quit [Ping timeout: 260 seconds]
<nikunj97> thanks a lot!
<hkaiser> nikunj97: but I sent you that link to the hpc course, right?
<nikunj97> yes
<nikunj97> I've added that to the list as well
<nikunj97> I'll prepare a blog from the bullets I've collected. We'll publish it for the university students who are interested in the field but can't get into.
<hkaiser> nod
<hkaiser> nikunj97: no additional comments at this point
<nikunj97> does it look complete to you?
<nikunj97> it's been compiled by 2 of us. My friend worked with chapel last year in gsoc. he's added some points from his experiences as well
<hkaiser> this kind of thing is never complete ;-)
<nikunj97> haha right, I meant am I missing something that may catch a beginner's eye?
<hkaiser> it's a nice starting point
diehlpk has joined #ste||ar
diehlpk has quit [Changing host]
diehlpk has joined #ste||ar
shahrzad has joined #ste||ar
diehlpk has quit [Ping timeout: 240 seconds]
diehlpk has joined #ste||ar
diehlpk has quit [Ping timeout: 256 seconds]
shahrzad has quit [Ping timeout: 240 seconds]