hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020
<hkaiser> Yorlik: I'm using jemalloc, that's a big difference
<Yorlik> So - worth it?
<hkaiser> absolutely
<Yorlik> Lua does a lot of small allocations
<Yorlik> That might actually be a gamechanger then
<Yorlik> I'll give it a try. Is there anything special i should have in mind when integrating it?
<hkaiser> jemalloc is not replacing the system allocator, though, but must be used explcitly (what we do in hpx)
<hkaiser> at least not on windows
<Yorlik> So Lua won't use it automagically?
<Yorlik> We might patch Lua if it's really worth it.
<hkaiser> right, except if yo can tell lua to use it
<Yorlik> does it replace malloc or is it a special function?
<hkaiser> we have a special c++ allocator we use everywhere
<hkaiser> on windows it does not replace malloc
<Yorlik> I'll read the jemalloc docs
<Yorlik> tcmalloc doesn't work on windows, does it?
<hkaiser> Yorlik: it does, but I have never used it
<Yorlik> OK
<Yorlik> Might try both
<hkaiser> hpx might even support it on windows, it used to a while back, not sure whether it still works now
<Yorlik> OK.
<Yorlik> Thanks for the info!
<Yorlik> So you're wrapping jemalloc into a C++ allocator?
<hkaiser> yes
<hkaiser> on windows
<hkaiser> on linux, both jemalloc and tcmalloc just replace the system allocator
<Yorlik> IC.
<hkaiser> Yorlik: there is also mimalloc which is supposed to be even faster, we support it but I have not actually tried it
<Yorlik> So three malloc replacements to try.
<hkaiser> I think on windows mimalloc replaces the system allocator, so it might be the easiest to use for you
<Yorlik> Makes sense, though on the long run we'll use Linux
<Yorlik> I just like working with visual studio
<hkaiser> sure, same here
<Yorlik> Since HPX is a dll - will it us it if I link my main program against mimalloc?
<hkaiser> Yorlik: you might want to enable it on hpx, then it should be automatically used by your executable as well
<Yorlik> OK - I'll look up the switches.
<hkaiser> -DHPX_WITH_ALLOCATOR=mimalloc
<Yorlik> Thanks a ton !
<hkaiser> Yorlik: sorry, it's HPX_WITH_MALLOC=mimalloc
<Yorlik> OK
<hkaiser> pls let me know how it works, never used it
<Yorlik> How do I point to the library / header
<Yorlik> Or do I just place the dll in the path?
<hkaiser> Yorlik: I think they have cmake support
<Yorlik> Still working on it - I'll figure it out
<Yorlik> Something interfered.
<Yorlik> Need to continue a bit later
<hkaiser> Yorlik: we just do a find_package(mimalloc), so you can probably use the standard variables, like MIMALLOC_DIR to point to the cmake config files
<Yorlik> Yes.
<Yorlik> Its already built - but I can't go further right now.
wate123 has joined #ste||ar
wate123 is now known as wate123_Jun
<Yorlik> hkaiser: Finished the first compile with mimalloc, doing the others now
<Yorlik> I had to install it with cmake to get everything right
wate123_Jun has quit [Remote host closed the connection]
wate123_Jun has joined #ste||ar
wate123_Jun has quit [Remote host closed the connection]
wate123__ has joined #ste||ar
bita has quit [Quit: Leaving]
<zao> Yorlik: Symbols in executables or dynamic libraries only satisfy lookups explicitly made against those modules or indirectly by redirection to those modules.
<zao> There's none of the symbol soup that you get on Linux and other libdl-like systems where symbols can be overridden from heaven knows where.
<Yorlik> IC. So it'll hopefully work - recompiling the server - all three uilds worked.
<zao> There might be hooks in the CRT or suchlike that can be leveraged to impact DLLs using the same CRT, but it's likely that you need to consider it on a per-module basis.
akheir1_ has quit [Read error: Connection reset by peer]
akheir1_ has joined #ste||ar
<hkaiser> zao: they runtime patch the standard allocator
<hkaiser> so having it linked to one module affects all
<hkaiser> kinda like weak symbols on linux
<zao> Allocator as in the C++ one from the CRT?
<hkaiser> yes
<zao> And not HeapAlloc and friends?
<hkaiser> and malloc/free
<Yorlik> The CMake test compile it alays does breaks
<Yorlik> @SET "PATH=%PATH%;
<Yorlik> Woops
<Yorlik> 1> [CMake] LINK : error LNK2001: unresolved external symbol mi_version
<hkaiser> it misses the library then
<Yorlik> mimalloc gets found and everything and is in the path
<hkaiser> as I said, I never tried it...
<Yorlik> I'll figure it out
<hkaiser> is the library on the command line?
<hkaiser> Yorlik: I can try tomorrow, too tired now
<Yorlik> no
<Yorlik> NP
<Yorlik> I might have overlooked something. But the mimake_DIR is set and the path too.
<Yorlik> I think I need to add it as library to my app and forgot thgat
<Yorlik> It's only in HPX
<hkaiser> that's fine
<hkaiser> does hpx build?
<Yorlik> Yes it did
<Yorlik> I had to install mimalloc with cmake directly
<hkaiser> the your app does not need to do anything
<Yorlik> It didn't work from MSVC
<Yorlik> Somehow it wants mimalloc
<hkaiser> does it add mi_version to the linker explicitly?
<hkaiser> then you'll have to link with the library
<hkaiser> probably doesn't urt to link it to the app as well
<Yorlik> It breaks when generating the cache
<hkaiser> hmm
<hkaiser> hpx re-exports mimalloc to its apps if you use HPX::hpx as a dependency
<Yorlik> I am using latest mimalloc - maybe the changed the symbol
<hkaiser> nah
<hkaiser> mi_version is importedby hpx and it finds it
<Yorlik> Do I need to add mimalloc in the Component_dependencies?
<hkaiser> you may want to add it to your app as a dependency
<Yorlik> in the link libraries or in the hpx setup?
<hkaiser> your app
<Yorlik> You mean the header?
<hkaiser> no your cmake
<Yorlik> Or just link only?
<hkaiser> where you build your application add_executable or similar
shahrzad has joined #ste||ar
<Yorlik> I have it in my TARGET_LINK_LIBRARIES
<hkaiser> no idea
<hkaiser> anyways, I'm off - will try tomorrow
<Yorlik> OK
<hkaiser> find out what module fails linking and add it as a dependency to that
<Yorlik> G'Night!
<Yorlik> OK
<Yorlik> Its the Cmake test compile - lol
<hkaiser> whatever
<hkaiser> if you use a symbol you need to have it as a target_link_library dependency
<Yorlik> Its not my program. It's the cmake test comopile done every time that fails
<Yorlik> Tomorrow ...
<hkaiser> Yorlik: well cmake uses your setting to test compile things
<hkaiser> doesn't it?
<Yorlik> It should.
<hkaiser> ok
<Yorlik> Find package doesn't fail
<hkaiser> cheers for now
<Yorlik> Good Night !
hkaiser has quit [Quit: bye]
nikunj has quit [Read error: Connection reset by peer]
nikunj has joined #ste||ar
wate123__ has quit [Remote host closed the connection]
wate123_Jun has joined #ste||ar
wate123__ has joined #ste||ar
weilewei has quit [Remote host closed the connection]
wate123_Jun has quit [Ping timeout: 256 seconds]
wate123__ has quit [Ping timeout: 240 seconds]
akheir1_ has quit [Quit: Leaving]
wate123_Jun has joined #ste||ar
diehlpk_work has quit [Remote host closed the connection]
shahrzad has quit [Quit: Leaving]
wate123_Jun has quit [Ping timeout: 256 seconds]
wate123_Jun has joined #ste||ar
wate123_Jun has quit [Ping timeout: 240 seconds]
<simbergm> Yorlik: can you comment on 4462 again with the details of the tests that fail for you? I'll have a look at it later
<simbergm> all the error messages etc, does it fail to link, build, run, and so on
ibalampanis has joined #ste||ar
ibalampanis has quit [Remote host closed the connection]
<simbergm> Yorlik, actually, I think I know what's wrong
<simbergm> I meant to update one of the tests to test that, will check later if it's that
<simbergm> hpx_main.hop is likely not installed anymore
wate123_Jun has joined #ste||ar
wate123_Jun has quit [Ping timeout: 240 seconds]
<simbergm> Yorlik: should be fixed now
vip3r has joined #ste||ar
vip3r is now known as kale
kale has quit [Client Quit]
nikunj97 has joined #ste||ar
wate123_Jun has joined #ste||ar
wate123_Jun has quit [Ping timeout: 240 seconds]
wate123_Jun has joined #ste||ar
wate123_Jun has quit [Ping timeout: 240 seconds]
ibalampanis has joined #ste||ar
<ibalampanis> Everyone has a good day!
<rori> ibalampanis: thanks you too
<rori> zao: amazing your pull request fetching command :D
<rori> Yorlik: I added a cmake script for mimalloc support:
<rori> let me know if it is working for you
wate123_Jun has joined #ste||ar
wate123_Jun has quit [Ping timeout: 240 seconds]
ibalampanis has quit [Remote host closed the connection]
wate123_Jun has joined #ste||ar
wate123_Jun has quit [Ping timeout: 240 seconds]
ibalampanis has joined #ste||ar
nikunj97 has quit [Read error: Connection reset by peer]
wate123_Jun has joined #ste||ar
wate123_Jun has quit [Ping timeout: 240 seconds]
<Yorlik> rori: Very cool! IÄll check it out, though I probably might use jemalloc instead, since we're going for Linux anyways in production.
<Yorlik> Yesterday I found out when creating a Lua State you can give Lua a custom alloc function, which basically is a thin wrapper around realloc.
<Yorlik> I might just be able to give Lua jemalloc without modifying it. :)
hkaiser has joined #ste||ar
<Yorlik> Heyo jkaiser!
<Yorlik> Woops hkaiser ^^
<hkaiser> hey Yorlik
<hkaiser> g'morning
<Yorlik> So yesterday:
<Yorlik> Yeah - morning ! lol :)
<Yorlik> I compiled jemalloc and used it - it worked.
<Yorlik> And:
<Yorlik> I found out I can use Lua with a customizable alloc function by nature
<hkaiser> thought so
<Yorlik> The function Lua expects is just a thin wrapper around realloc
<Yorlik> So I can basically create my Lua States on a per state basis with whatever allocator I want.
<hkaiser> perfect
<hkaiser> jemalloc it is, then
<Yorlik> jemalloc worked like a charm, but I ve no results yet - need to fix and cleanup some other things.
<Yorlik> But that's my task for today: Cusom allocation.
<hkaiser> shouldn't be too hard, should it?
<Yorlik> So many reasons to love Lua - now there's just another one :)
<Yorlik> It'l work, I'm pretty sure
<Yorlik> hkaiser: this is the function one wants to mimic: https://github.com/lua/lua/blob/master/lauxlib.c#L986
<Yorlik> It's probably as trivial as it looks.
<hkaiser> sure, jemalloc has both, je_free and je_realloc
<Yorlik> Yup.
Abhishek09 has joined #ste||ar
<K-ballo> getting build failures due to missing hwloc includes again
<Yorlik> K-ballo: Do you know of that fix yesterday?
<K-ballo> Yorlik: which yesterday fix?
<Yorlik> The hwloc issue
<Yorlik> Its a missing line in the Cmakefile for hpx_init
<Yorlik> Lemme dig it up
<Yorlik> Its in 4462, but you need only 1 line
<ibalampanis> hkaiser I have submitted my proposal as a draft in official gsoc website>
<hkaiser> ok, cool
<ibalampanis> Also, I have informed Bita
<Yorlik> K-ballo: Try adding a `target_link_libraries(hpx_init PUBLIC hpx)` around here? https://github.com/STEllAR-GROUP/hpx/blob/master/src/CMakeLists.txt#L386
<hkaiser> just use #4462
<hkaiser> I'll merge it later today anyways
<Yorlik> K-ballo ^^
<K-ballo> i'll wait
<Abhishek09> hkaiser: How many slots generally ste||ar allocated for GSoC?
<ibalampanis> hkaiser up to the end of weekend I will update the proposal including the link of github repo if the MM with HPX
<hkaiser> 5-7
<ibalampanis> Now I working for this hkaiser
<hkaiser> sure, good luck!
<ibalampanis> Thank you for all your support!
<Abhishek09> hkaiser: Is Selection is only depends on quality of proposal in Ste||ar?
<hkaiser> yes
<ibalampanis> hkaiser Could I make a question to you?
<hkaiser> quality of proposal, general activity of the student, what is the quality of the example code we ask to write, etc.
<hkaiser> ibalampanis: you already did ;-)
<hkaiser> sure go ahead
<ibalampanis> Why don't you use Slack for chat? Many of organizations has this as a media.
<Abhishek09> example code means? hkaiser
<hkaiser> ibalampanis: historic reasons - is there a big difference?
<ibalampanis> No, no, just a question!
<ibalampanis> Thanks!
<hkaiser> Abhishek09: we ask our students to implement a small example matrix multiplication using HPX to get an idea of what you know
<hkaiser> ibalampanis: ;-)
<Abhishek09> hkaiser: Why not you gitter rather than irc . Many org started using it
<hkaiser> ibalampanis: one of the main reasons is that slack does not provide us with a full history of the conversations
<Yorlik> hkaiser: Can HPX executors be combined, like proposed in the isocpp proposal? I'd like to try out jbjnrs limiting_executor. Alternatively I'd add it's functionality to your hooking thing.
<Abhishek09> gitter is far better than slack
<hkaiser> Abhishek09: there are many options, we have not been able to agree on something else than irc so far - this channel here exists for more than 10 years, people got used to it
<heller1> Also, slack is acting under us regulations and bans certain foreign nationals
<hkaiser> Yorlik: our executors are not conforming to p0443 at this point, they represent an older version of it from 3-4 years ago
<heller1> Simply put: it's a for profit organization where you are the product, not the customer
<Yorlik> We just need to rewrite Discord and use HPX under the hood ;)
<hkaiser> Yorlik: however, I showed you in the example how you can create wrappers that can rely on other executors and add/remove things
wate123_Jun has joined #ste||ar
<Yorlik> Yes - I'll look into it.
<Yorlik> Gotta study the code better now.
<ibalampanis> Yorlik Are you interested in gsoc?
<Yorlik> I'm not a student
<ibalampanis> Ok! :D
<ibalampanis> :) *
<Yorlik> WQe're a group of hobbyists writing a gameserver with HPX.
<ibalampanis> Wow! After gsoc I would like to contribute on it
<Yorlik> Sure - contact me. But we're not open source.
<Abhishek09> zao nikunj Hi
<Yorlik> But we use Lua !!
<ibalampanis> @y
<ibalampanis> Yorlik How come you are nοt?
<hkaiser> Yorlik: I still hope to get a free license for your game, though ;-)
<Yorlik> There are several reasons. It's a long winded discussion I'd avoid for today.
<ibalampanis> Ok! Understood!
wate123_Jun has quit [Ping timeout: 256 seconds]
<Yorlik> We do not really have commercial ambitions, we we might monetize just to cover the costs. E.g. an option could be to make a dual license later, but giving out server code gives a great advantage to hackers, so we'd do that rather late in the process if ever.
<Yorlik> In the moment it's a msall puny exercise made mostly by me anyways.
<Yorlik> With some grains of awesomeness ;)
<ibalampanis> Sure, I haven't in my mind this about hacking
<ibalampanis> Hah, you 're great!
<Yorlik> MMO are always under massive attack from hackers.
<Yorlik> LOL - No - just a little crazy :)
<ibalampanis> Crazy is more suitable word than great. I admit it
<ibalampanis> ;)
<Yorlik> You need to be a little crazy to do what we do.
<Yorlik> When a hobbyist start saying: We want to write an MMO the reaction usually is always negative.
<ibalampanis> Could you give me your email or a social media account in order not to spam here?
<Yorlik> For reasons: It's quite a task. But with libraries like HPX and Lua it's much more feasible than it used to be.
<Yorlik> Sure: mckillroy lives at gmail with the dot of com.
<ibalampanis> hah thanks
<Yorlik> :)
<ibalampanis> I just send you an ack message Yorlik
<Yorlik> It arrived
<ibalampanis> It's ok!
Hashmi has joined #ste||ar
<ibalampanis> hkaiser Are you in USA? I ask this to have in my mind the local time difference
<hkaiser> yes, I'm in the US central time zone
<ibalampanis> What about Bita? If you know.
<hkaiser> same
<ibalampanis> So, your local time is 9:04 (24hr)
<Yorlik> hkaiser: Is the default executor essentially the interface description when writing an executor? Or is there a more overview writeup on the concept somewhere?
<Yorlik> I see there is a bit in examples though.
wate123_Jun has joined #ste||ar
<ibalampanis> hkaiser Is fault if I repo is a Jetbrains CLion project? This IDE has the Cmake as integrated tool and the execute of code is able via a button.
nikunj97 has joined #ste||ar
<Yorlik> ibalampanis: Last time I looked at CLion, their CMake only supported make generation and not ninja or MSBuild. Not sure if it's still like that.
<ibalampanis> I don't know what you say. What do you mean generation?
<Yorlik> CMake is not a build system, but a generator for various build systems, like make, MSBuild or Ninja.
<Yorlik> IIRC CLIon does not support all of them, but just makefile generation.
<Yorlik> But that might be outdated info. But if it still is true and you want something else with CMake than makefiles you might run into issues with CLion.
<ibalampanis> Yeah, I understand now. I don't know if now supports something else than make
<Yorlik> I just thought I'd tell you for consideration.
Pranavug has joined #ste||ar
<ibalampanis> Do you believe that for this project (https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020#test-framework-for-phylanx-algorithms) I will face issues on CLion?
<Yorlik> I'm not qualified to answer this.
<ibalampanis> Hah, it's ok
* Yorlik is just a hobbyist with same half baked semi-knowledge which ~just works enough :)
Pranavug has quit [Client Quit]
<ibalampanis> Sorry if I made a fault question
<Yorlik> I think it's a good question. You wanna use the tools that work for your project.
<ibalampanis> It's true
<ibalampanis> Yorlik: Which tools/IDEs do you suggest?
<Yorlik> I can only say what I use - but what's best for you might be totally different. I work mostly on Windows and am using MSVC Community edition. On Linux I use VSCode
<ibalampanis> Ok, VSCode is a safe pick. Works with everything!
<Yorlik> It's a good editor. And it works nicely with CMake and testing frameworks.
<Yorlik> You can also use intellisense with it
<ibalampanis> Yeah, have it in my mind
<ibalampanis> Thanks!
<Yorlik> VSCode also allows you to work remotely via SSH
<ibalampanis> Yeah I knew it! Thanks for your time! Time to go out for duties! Cheers!
<Yorlik> It has a remote server - so that'S a pretty slick solution, though I heared you should't use it over the internet because of security issues. But locally its a nice thing.
ibalampanis has quit [Remote host closed the connection]
shahrzad has joined #ste||ar
<zao> Yorlik: vscode’s ssh remote now only listens on localhost on the remote, so it’s safe on singleuser machines or where you trust the other users
<Yorlik> Ah OK - so they fixed it. Thanks for the heads-up.
<zao> I have yet to get any response on whether it has any token auth or not, idiots seem to be reluctant to just say.
<Yorlik> lol
<zao> Btw, saw this talk on server instrumentation yesterday - https://youtu.be/r6Ex29gzqgc
<zao> Might be of interest to you
<Yorlik> Instrumentation will soon be a thing for us, when we have Milestone 1 and go into a polishing pahse.
<Yorlik> jbjnr: I just stole three lines of code from your limiting executor and put them into the start hook of the executor I'm using - I already had task counting implemented. And it seems to work nicely.
<Yorlik> Essentially I needed just this:
<Yorlik> if ( ( ++task_counter<I> ) > upper_threshold_ ) {
<Yorlik> hpx::util::yield_while( [&]( ) { return ( task_counter<I>> lower_threshold_ ); } );
<Yorlik> }
<nikunj97> hkaiser, the only way to calculate grain size that I know of is putting up a high resolution clock and measure the task execution time. Is there any other way? I hear APEX is used for performance measurement in HPX. Would that help?
<hkaiser> nikunj97: yes, that's what the auto-chunker does
<Yorlik> hkaiser: I just put the above three-liner in the start() lambda of your hooked executor and it seems to work nicely
<hkaiser> right
<hkaiser> as expected
<Yorlik> Working on jemalloc now :) Can't wait to see the effects.
<hkaiser> shahrzad: you need to use Boost.Context on the Pi
<hkaiser> shahrzad: -DHPX_WITH_GENERIC_CONTEXT_COROUTINES=On
<hkaiser> Yorlik: you will an effect for sure
<nikunj97> hkaiser, is there any documentation on auto-chunker?
<shahrzad> @hkaiser OK,thanks!
<hkaiser> nikunj97: what is 'documentation'? ;-)
<nikunj97> lol. We do really need a blog after all
<nikunj97> I'll have a look at it, thanks@
<hkaiser> nikunj97: I can give you access to the blogs
<nikunj97> which ones?
<nikunj97> ohh wait, we have blogs?
<nikunj97> do you mean the one hosted at cct?
<hkaiser> nikunj97: I did a whole blog post series back in 2015 I believe, highlighting all the features
<Yorlik> A quick and dirty exerpt from my use of the auto chunker
<hkaiser> nikunj97: there is also a new website (just created a while back): hpx.stellar-group.org
<hkaiser> we will use it for all things HPX in the future
<nikunj97> Yorlik, thanks!
<nikunj97> hkaiser, aah the one you were talking about the other day
<nikunj97> I'll go through the blogs. Thanks
<Yorlik> nikunj: Made a cleanup of the gist - it's shorter now amd more clear.
<hkaiser> nikunj97: I'd be more than happy to give you access if you'd like to write blog posts
<nikunj97> hkaiser, I'm making myself familiar with the functionalities that I've rarely/never used
<nikunj97> I am sure to right one post it. HPX needs more publicity and user base
<hkaiser> perfect opportunity to document things as you go
<hkaiser> ;-)
<nikunj97> hkaiser, yes, I'm already writing smallest code snippet for the things I'm learning
<hkaiser> blog posts can be short, no reason to write a novel
<nikunj97> I believe it's always better to start with the easiest code you can write. And then you can show it's usage in a real application
<hkaiser> simple snippets are enough
<nikunj97> I'll be writing these blogs over the summer
<nikunj97> coz I don't think my internship is happening lol
<hkaiser> nikunj97: yah, there is that
wate123_Jun has quit [Remote host closed the connection]
wate123_Jun has joined #ste||ar
wate123_Jun has quit [Ping timeout: 256 seconds]
Abhishek09 has quit [Remote host closed the connection]
shahrzad has quit [Ping timeout: 240 seconds]
wate123_Jun has joined #ste||ar
<Yorlik> hkaiser: since HPX is using jemalloc - would I have to link my app again against jemalloc, or can I just use the symbols and only include the header where needed?
wate123_Jun has quit [Ping timeout: 240 seconds]
Hashmi has quit [Quit: Connection closed for inactivity]
nikunj97 has quit [Read error: Connection reset by peer]
nikunj97 has joined #ste||ar
wate123_Jun has joined #ste||ar
shahrzad has joined #ste||ar
ibalampanis has joined #ste||ar
<hkaiser> Yorlik: if you reference the symbols from your code you need to link against it, HPX might however re-export the library so that this might not require any actions on your side
<hkaiser> I htink the allocator is target_link_library'd to hpx publicly
<hkaiser> so you should be fine
<Yorlik> It's just working - just started a test :)
<hkaiser> ok
<Yorlik> Because of the task limiter I now simply let the lua states explode as needed, but I delete above a threshold on return to the pool
<Yorlik> It looks my test runs ~40-60% faster - I'll let it run a bit
<Yorlik> :D
<Yorlik> Big Win
<Yorlik> From ~40000 messages /sec to ~60000
<Yorlik> With 10000 calls into Lua
<Yorlik> 100 messages per call
<hkaiser> good
<Yorlik> That's muich more we ever had with the old crappy project.
<hkaiser> :D
<hkaiser> HPX for the win!
shahrzad has quit [Ping timeout: 240 seconds]
<Yorlik> I think I can now focus on a demo game and finalizing the default events and stuff.
<hkaiser> that's on a 4 core machine?
<Yorlik> Yes
<Yorlik> 4790k
<Yorlik> So - on a decent modern server this would surely b much better
<Yorlik> OFC, the Lua Load will be more.
<Yorlik> Still I'm happy with this result for now.
<hkaiser> Yorlik: we should give you access to our test cluster there you could do better benchmarks
<Yorlik> hat would be cool
<Yorlik> But I need to go finish the milestone first
<Yorlik> I want a basic fox-rabbit-grass population dynamics testcase
<hkaiser> Yorlik: talk to akheir here (once he's back), he manages the cluster
<Yorlik> And we are n ot yet distributed
<Yorlik> I need to get the location and load balancid system done for that
<Yorlik> The spatial partitioning
<hkaiser> no idea what a fox-rabbit-grass population is ;-)
<zao> A single node of like 14-28 cores still tends to give some insights, particularly around NUMA junk.
<Yorlik> Simple
<Yorlik> Grass Grows
<Yorlik> Rabbits eat grass
<Yorlik> Foxes eat rabbits
<Yorlik> Chaotic numbers in the subpopulations
<hkaiser> ok
<Yorlik> Its an easy way to create man objects.
<Yorlik> I'll make a very simplistic AI for that
<ibalampanis> @hk
<ibalampanis> hkaiser: Do you know if Bita is in dayoff today?
<hkaiser> Yorlik: interesting
<hkaiser> ibalampanis: it's weekend
<ibalampanis> Such a good note! Thanks!
<ibalampanis> hkaiser: Why aren't you in dayoff because of weekend?
<hkaiser> ibalampanis: I'm just lurking here ;-)
<ibalampanis> Hahah it's ok!
<ibalampanis> In order to double check, is your local time 12:35 ? (24hr)
<hkaiser> Yorlik: I merged #4462 just now
<hkaiser> ibalampanis: yes
<Yorlik> OK. stable it is then :)
<ibalampanis> Thanks!
<zao> ibalampanis: Local time in New Orleans should be accurate.
<Yorlik> I'll wait until the tag is there.
<hkaiser> Yorlik: you need to wait for the CI to cycle for stable to be updated to this
<Yorlik> Yup
<Yorlik> I have a working build after all.
<ibalampanis> zao: Thank you
<Yorlik> I think I'll use commit hashes in the future, so I have pinned stables
<hkaiser> Yorlik: I'd prefer you using the stable tag - nice way for us to discover problems ;-)
* Yorlik just joined the CI union
<nikunj97> zao, is a swap storage really important?
<zao> nikunj97: How elaborate of an explanation do you want? :)
<nikunj97> I've got 16gb ram
<nikunj97> and I'm always on 12-13gb
<nikunj97> it's when I'm not running any compilation/linking stuff
<nikunj97> it's with chrome, vs code, hexchat, spotify and a few other applications open
<nikunj97> like discord or slack
<zao> So your OS has a virtual memory system, yeah? Memory is split up into pages (4 KiB typically) and is backed either by physical RAM, physical storage, or just requested by not committed yet, having no backing storage.
<zao> Memory for things like loaded libraries or memory mapped files are backed by files on your disk, and can as long as it's not modified be dropped from memory and re-read from the files when needed.
K-ballo has quit [Remote host closed the connection]
K-ballo has joined #ste||ar
<zao> Memory for data allocated from a process or modified pages cannot be spilled to disk if there's no backing storage, it's stuck in RAM.
<zao> That is, unless you have a swap file/partition. It serves as an off-load area for less used pages from RAM, which the OS attempts to have somewhat pre-populated in case there's a burst in memory usage and it needs to evict something.
<zao> It helps free up and compact the contents of physical RAM that may have been spuriously loaded or not used at all.
<nikunj97> I see
<nikunj97> I should add one. Should help relieve the stress on memory
shahrzad has joined #ste||ar
<hkaiser> Yorlik: I approve
<Yorlik> hkaiser: Erm .. what?
<hkaiser> Yorlik: you joining the CI union ;-)
<Yorlik> :) lol
<Yorlik> They told me to take a nap to keep my overwhelming beauty - which I will do now - BBL :)
<nikunj97> heller1, why are my stream results looking way too different than the other day executing the same binary?
<nikunj97> I'm getting about 70-80GB/s bandwidth on x86 node all of a sudden
<hkaiser> nikunj97: different phase of moon ;-)
Abhishek09 has joined #ste||ar
<nikunj97> lol, it can make such a difference?
<hkaiser> sure!
<hkaiser> din't you know?
<nikunj97> it doubled!
<nikunj97> not like 10% or 20%
<hkaiser> so you changed something
<nikunj97> all my calculations are no good atm with these new results
<nikunj97> I didn't change anything. I executed an already compiled file that I used previously to record exactly the same thing
<hkaiser> yaya
<hkaiser> something else was going on on your machine when you tried before?
<nikunj97> it was executed on a node allocated by slurm
<hkaiser> rostam?
<nikunj97> no the cluster is at jsc
<hkaiser> exactly the same node?
<nikunj97> not sure about that
<hkaiser> see
<nikunj97> but they're all same x86
<hkaiser> you had that effect before when you were with us, remember?
<hkaiser> *sure*
wate123__ has joined #ste||ar
<nikunj97> yes, I remember the effect
<hkaiser> even on rostam equivalent the nodes are different
<nikunj97> so I choose a single node to record and benchmark everything
<nikunj97> btw which one will have higher memory bandwidth, float or double?
<nikunj97> peak memory bandwidth i.e.
wate123_Jun has quit [Ping timeout: 240 seconds]
<heller1> The data type is irrelevant
<nikunj97> heller1, 60GB/s with float and 80GB/s
<nikunj97> with double
<nikunj97> that's why I was wondering what gives
<nikunj97> on arm
<heller1> Well, you might need to increase the array size
<heller1> Float is half the size, might mess up the measurements
<nikunj97> aah that makes sense
<heller1> Also, x86 != x86
<nikunj97> I meant they're all Xeon E5 2660 v3
<nikunj97> same frequency as well
<nikunj97> and now arm is only giving 20GB/s for some reason
<nikunj97> it's exactly the same node but giving way less memory bandwidth. What am I doing wrong?
ibalampanis has quit [Remote host closed the connection]
<nikunj97> heller1, just read the array size rule on stream. It's definitely array size issue. Arm has 64MB cache while the array size is 10M
<nikunj97> L3 cache ^^
<heller1> See
<nikunj97> I should read things more carefully :/
<nikunj97> I changed the array size to 128M so I should see consistent rates both on hisilicon and e5
<nikunj97> heller1, yup, very consistent now. I don't see difference in float and double as well
nk__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 240 seconds]
weilewei has joined #ste||ar
shahrzad has quit [Ping timeout: 256 seconds]
<weilewei> May I ask, what is the status of concurrent data structure support in GSoC project? I am interested in participating as a mentor (and learning as well)
<weilewei> Will we aim at implement concurrent_unordered_set
wate123__ has quit [Remote host closed the connection]
wate123_Jun has joined #ste||ar
wate123_Jun has quit [Ping timeout: 256 seconds]
nk__ has quit [Read error: Connection reset by peer]
nikunj97 has joined #ste||ar
<nikunj97> Abhishek09, hey
<nikunj97> just saw your text from afternoon
<nikunj97> what is it that you want to talk about?
<Abhishek09> nikunj97: Does installation of phylanx require library files for perfect working or it is fine with devel and header files?
<zao> Abhishek09: Hi there, did you want anything particular this morning? I saw you highlighted me but didn't say what it was about.
<zao> (the above, I guess)
<Abhishek09> zao: i have lost that thing
<nikunj97> Abhishek09, I didn't get you
<zao> If you're talking about which one of `hpx` and `hpx-devel` you need from the OS, an answer is that `hpx-devel` depends on `hpx`.
<zao> So you're going to either have `hpx` or `hpx` + `hpx-devel` installed if you're working with OS packages.
<zao> This is customary.
<zao> A development package contains an additional set of files on top of the base set of files in the regular package.
<zao> Both packages are needed to form a development environment.
<zao> Is this what you wondered about? :D
<Abhishek09> zao: Yes , that means hpx and hpx devel both are madatory for phylanx
<Abhishek09> installation
<Abhishek09> ?
<Abhishek09> hpx devel depends on hpx
<zao> You could reason it out from what you know that the Phylanx build needs too.
<zao> It builds a Python package with a native extension.
<zao> The native extension when built needs the headers to compile and the library to link.
<Abhishek09> As i have build phylanx by installing hpx by source not dnf
<zao> The only case you can get away with not having libraries installed is when you have a dependency that doesn't _have_ libraries, for example header-only dependencies like Eigen and Blaze
<zao> Great.
<Abhishek09> That means i have to ensure that all deps must works fine(library+built files+header) before building phylanx zao
<zao> I've found in the past that as long as the install step has worked, the dependencies are usable.
<nikunj97> hkaiser, so auto_chunk_size essentially makes sure that parallel_for_loop creates tasks such that their grain size is the time specified to auto_chunk_size?
<nikunj97> "This executor parameters type makes sure that as many loop iterations are combined as necessary to run for the amount of time specified." - Just making sure if I got this right
<Abhishek09> zao: Soon i will draft a proposal . i will built entirely (lib+header+built files) by source , Any tips for me you want to give
<Abhishek09> nikunj97 ^
<nikunj97> nope, looks good
<nikunj97> you'll have to build most of things from source
<zao> Abhishek09: I think I've said this in the past, but make a habit of installing things into a non-system directory, so that it's easy to remove and control.
<zao> Also take good notes on what commands you run so it's reproducible.
wate123_Jun has joined #ste||ar
<nikunj97> Yorlik, ^^
<Yorlik> Ya?
<nikunj97> about the auto_chunk_size thing
<Abhishek09> zao: You also patrticipating this year As a student in GSoC?
<nikunj97> I said before, is it correct?
<zao> Abhishek09: Nope, I've never done GSoC. I'm a professional sysadmin and application expert at a HPC site.
<zao> My primary job is to build and install software for researchers :D
<Yorlik> nikunj: The auto chunker does measurements. I think you pay like 10% for these IIRC what hkaiser said. He knows better.
<nikunj97> aah so I invoke get_chunk_size to know what the grain size was?
<Yorlik> I never used that function, but probably yes
<nikunj97> wait if auto chunker does measurements, there should be a way to report it back right?
<nikunj97> I'm asking that
<nikunj97> so you use executor.with( auto_chunk_size( 2000us ) ), what does it actually do?
<Yorlik> The auto chinker attempts to size you chunks such, that they run 2000us
<Yorlik> So if one iteration take 1us it would do 2000 iterations per chunk
<nikunj97> but with 10% overheads
<Yorlik> IIRC yes. hkaiser should tell exactly.
<nikunj97> ok no, this isn't what I wanted. I want to time the grain size
<Yorlik> You can use an exe3cutor, which has a start and a stop function
<nikunj97> not make the runtime change iterations to make it that grain size
<Yorlik> So you can pout your measurement hooks there
<nikunj97> you mean performance counter?
<Yorlik> Yes
<Yorlik> However you want to measure
<nikunj97> yea, that's what I think I'll do. Thanks!
<Yorlik> hkaise recently made this example I posted. I'm actually using it
<nikunj97> aah! so you were asking about the other day?
<Yorlik> Yes
<Yorlik> I needed it for a different purpose
<Yorlik> Lemme make a snippet for you
<nikunj97> this doesn't look like performance counter to me
<Yorlik> Its the executor which has the start and stop hook
<nikunj97> yeah, basically add your stuff on start and stop
<Yorlik> This is a function from my codebase using it: https://gist.github.com/McKillroy/f81277abc7685832f785c73decfeeb20
<Yorlik> You can put whatever you want into the start and stop lambdas
<Yorlik> You put the executor in the parloop and give it the start and stop lambdas where you can place your measurement or perfcounters.
<Yorlik> I use it to limit task creation to a max
<Yorlik> It gives you control over a chunk.
<Yorlik> And in the loop you can use the auto chunker or a static chunker
<nikunj97> got it
<nikunj97> it's again not what I was looking for btw. But it's a nice idea
<Yorlik> Maybe I misunderstood - what exactly do you need?
<nikunj97> so basically I want to measure the time of my execution of a task. I previously used to use high resolution timers to measure the grain size
<Yorlik> Do you want to measure your chunks or do you want to size them?
<nikunj97> like start the timer on invoking the function and calculate the time elapsed in the end
<nikunj97> I want a better way to get the time
<nikunj97> I neither want to measure them nor change the size
<Yorlik> Just measure a chunk and divide by the number?
<nikunj97> but that won't give me a right measure coz they have 10% overheads
<Yorlik> Not the static chunker
<Yorlik> Onlky the auto chunker
<Yorlik> The static chunker has zero overhead
<Yorlik> (Or almost zero)
<nikunj97> static chunker simply measures the grain size?
<Yorlik> No
<nikunj97> or does it try to do something similar to auto chunker?
<Yorlik> You set the size of a grain
<Yorlik> Its fixed
<nikunj97> aah got it, your snippet
<nikunj97> simply tell the number of iterations you want
<Yorlik> Do you want auto chunking or not?
<Yorlik> Yes
<nikunj97> nope, I do not want auto chunker
<nikunj97> static chunker looks better
<Yorlik> The use the static one. Its cheap
<nikunj97> I do not want the runtime to optimize things, I'm optimizing according to an underlying hardware ;-)
<Yorlik> So you make measurements and then you know your grain size?
<Yorlik> But you do it only once?
<Yorlik> You could simply do a claibration phase then, before the real heavy duty job starts.
<Yorlik> As long as your item load is constant that should work nicely.
<nikunj97> my item load is constant
<nikunj97> I just want to keep the load at optimal
<Yorlik> So basically you just want to measure your hardware.
<nikunj97> that's why I wanted to measure the grain size
<Yorlik> And adjust to it
<nikunj97> Yorlik, precisely
<Yorlik> static chunker then
<nikunj97> got it!
<Yorlik> Or just run a shorter loop
<nikunj97> what do you mean?
<Yorlik> Like a short calibration loop and then adjhust the chunks
<Yorlik> But maybe its better to have the chunkin in - then you have measured all overheads.
<nikunj97> yea
<Yorlik> But thats a detail question.
<Yorlik> Good luck!
<nikunj97> I think I got it, let me try
<nikunj97> thanks for the help!
<Yorlik> Cheers! :)
<nikunj97> Yorlik, one more thing
<Yorlik> Ya?
<hkaiser> nikunj97: auto-chunker is your friend
<nikunj97> so if a parallel_for loop goes from: 0 100, and you set static_chunk_size=10, you'll have 10 tasks in total, right?
<Yorlik> hkaiser: how large is its overhead again?
<nikunj97> hkaiser, aren't the overheads high though?
<hkaiser> you can do both: measure the best chunksize and then set it (possibly using the static chunker
<hkaiser> you can also create your own chunker that measures things once and uses the settings for all subsequent uses or somesuch
<hkaiser> Yorlik: overheads of what?
<Yorlik> hkaiser: What's the cost of the autochunker again?
<hkaiser> you tell it
<hkaiser> by default it runs 1% of the iterations to measure
<hkaiser> but you can change that
<Yorlik> Oh I C
<nikunj97> 1% aint't bad
<Yorlik> I wasn't clear about that, just told nikunj97 about it.
<nikunj97> I can use auto chunker then
<hkaiser> nikunj: could be too much
<Yorlik> 1% is 6 minutes in 10 hours :)
<hkaiser> Yorlik: you want to measure it once every now and then and just reuse afterwards
<hkaiser> writing a chunker is trivial, just look at the existing ones
<Yorlik> I will need constant monitoring, since the workloads always change
<Yorlik> I'm happy to use the autochunker
<hkaiser> right, that's what I said
<hkaiser> ok
<Yorlik> nikunj has a different situation, He just wants to measure his hardware to determine the grainsize for a constant job
<Yorlik> Mine is horribly dynamic.
<nikunj97> Yorlik, setting auto chunker to find the best grain-size will also do the trick for me
<nikunj97> once I find the best, I'll get rid of the auto chunker to get another 1% speedup
<Yorlik> With 1% it's much smaller than I wrongly remembered
<nikunj97> It's just that I wanted to know a good way where I don't have to time loops and write scripts to know better
<nikunj97> this way I can just run the thing on the chunk size I want
<Yorlik> What kind of job are you running?
<nikunj97> it's a stencil benchmark
<Yorlik> OK
<nikunj97> I want to optimize it
<Yorlik> IC
<nikunj97> to the best I can. If I can show that HPX's functionality hide in the noise and I get near optimal performance, I can write a paper on it
<nikunj97> which will increase HPX's publicity and help me with my academic journey :D
<Yorlik> Cool :) HPX is a really good piece of tech. You'll have fun :)
<nikunj97> hkaiser, btw Sanjay informally offered my a PhD during the interview
<nikunj97> he was telling me all the PhD deadlines and wanted to know what I wanted to pursue
<nikunj97> *offered me a PhD
<hkaiser> nikunj97: nice
wate123_Jun has quit [Ping timeout: 240 seconds]
karame_ has quit [Remote host closed the connection]
wate123_Jun has joined #ste||ar
<nikunj97> hkaiser, HPX's functionality do really hide in noise. https://gist.github.com/NK-Nikunj/135c84c72d4ef44a991e200473e777f4
<nikunj97> one of my recent runs
<hkaiser> nice
<nikunj97> I think I can get even closer to the serial versions
<Yorlik> nikunj: You wanna keep the thermal environment constant to really compare ;)
<nikunj97> Yorlik, idk how they handle their clusters. But I believe they're doing their best ;)
* Yorlik is trolling just a wee bit ...
<Yorlik> I'm a bit hyper because jemalloc gave us such a nice boost and everything seems to work today :)
<nikunj97> Must be a really good day!
<Yorlik> Yup.
* Yorlik is coding with Lee "Scratch" Perry music.
weilewei has quit [Ping timeout: 240 seconds]
shahrzad has joined #ste||ar
bita has joined #ste||ar
wate123_Jun has quit [Remote host closed the connection]
shahrzad has quit [Ping timeout: 256 seconds]
bita has quit [Ping timeout: 240 seconds]
wate123_Jun has joined #ste||ar