<hkaiser>
pls let me know how it works, never used it
<Yorlik>
How do I point to the library / header
<Yorlik>
Or do I just place the dll in the path?
<hkaiser>
Yorlik: I think they have cmake support
<Yorlik>
Still working on it - I'll figure it out
<Yorlik>
Something interfered.
<Yorlik>
Need to continue a bit later
<hkaiser>
Yorlik: we just do a find_package(mimalloc), so you can probably use the standard variables, like MIMALLOC_DIR to point to the cmake config files
<Yorlik>
Yes.
<Yorlik>
Its already built - but I can't go further right now.
wate123 has joined #ste||ar
wate123 is now known as wate123_Jun
<Yorlik>
hkaiser: Finished the first compile with mimalloc, doing the others now
<Yorlik>
I had to install it with cmake to get everything right
wate123_Jun has quit [Remote host closed the connection]
wate123_Jun has joined #ste||ar
wate123_Jun has quit [Remote host closed the connection]
wate123__ has joined #ste||ar
bita has quit [Quit: Leaving]
<zao>
Yorlik: Symbols in executables or dynamic libraries only satisfy lookups explicitly made against those modules or indirectly by redirection to those modules.
<zao>
There's none of the symbol soup that you get on Linux and other libdl-like systems where symbols can be overridden from heaven knows where.
<Yorlik>
IC. So it'll hopefully work - recompiling the server - all three uilds worked.
<zao>
There might be hooks in the CRT or suchlike that can be leveraged to impact DLLs using the same CRT, but it's likely that you need to consider it on a per-module basis.
akheir1_ has quit [Read error: Connection reset by peer]
akheir1_ has joined #ste||ar
<hkaiser>
zao: they runtime patch the standard allocator
<hkaiser>
so having it linked to one module affects all
<hkaiser>
kinda like weak symbols on linux
<zao>
Allocator as in the C++ one from the CRT?
<hkaiser>
yes
<zao>
And not HeapAlloc and friends?
<hkaiser>
and malloc/free
<Yorlik>
The CMake test compile it alays does breaks
<Yorlik>
@SET "PATH=%PATH%;
<Yorlik>
Woops
<Yorlik>
1> [CMake] LINK : error LNK2001: unresolved external symbol mi_version
<hkaiser>
it misses the library then
<Yorlik>
mimalloc gets found and everything and is in the path
<hkaiser>
as I said, I never tried it...
<Yorlik>
I'll figure it out
<hkaiser>
is the library on the command line?
<hkaiser>
Yorlik: I can try tomorrow, too tired now
<Yorlik>
no
<Yorlik>
NP
<Yorlik>
I might have overlooked something. But the mimake_DIR is set and the path too.
<Abhishek09>
hkaiser: How many slots generally ste||ar allocated for GSoC?
<ibalampanis>
hkaiser up to the end of weekend I will update the proposal including the link of github repo if the MM with HPX
<hkaiser>
5-7
<ibalampanis>
Now I working for this hkaiser
<hkaiser>
sure, good luck!
<ibalampanis>
Thank you for all your support!
<Abhishek09>
hkaiser: Is Selection is only depends on quality of proposal in Ste||ar?
<hkaiser>
yes
<ibalampanis>
hkaiser Could I make a question to you?
<hkaiser>
quality of proposal, general activity of the student, what is the quality of the example code we ask to write, etc.
<hkaiser>
ibalampanis: you already did ;-)
<hkaiser>
sure go ahead
<ibalampanis>
Why don't you use Slack for chat? Many of organizations has this as a media.
<Abhishek09>
example code means? hkaiser
<hkaiser>
ibalampanis: historic reasons - is there a big difference?
<ibalampanis>
No, no, just a question!
<ibalampanis>
Thanks!
<hkaiser>
Abhishek09: we ask our students to implement a small example matrix multiplication using HPX to get an idea of what you know
<hkaiser>
ibalampanis: ;-)
<Abhishek09>
hkaiser: Why not you gitter rather than irc . Many org started using it
<hkaiser>
ibalampanis: one of the main reasons is that slack does not provide us with a full history of the conversations
<Yorlik>
hkaiser: Can HPX executors be combined, like proposed in the isocpp proposal? I'd like to try out jbjnrs limiting_executor. Alternatively I'd add it's functionality to your hooking thing.
<Abhishek09>
gitter is far better than slack
<hkaiser>
Abhishek09: there are many options, we have not been able to agree on something else than irc so far - this channel here exists for more than 10 years, people got used to it
<heller1>
Also, slack is acting under us regulations and bans certain foreign nationals
<hkaiser>
Yorlik: our executors are not conforming to p0443 at this point, they represent an older version of it from 3-4 years ago
<heller1>
Simply put: it's a for profit organization where you are the product, not the customer
<Yorlik>
We just need to rewrite Discord and use HPX under the hood ;)
<hkaiser>
Yorlik: however, I showed you in the example how you can create wrappers that can rely on other executors and add/remove things
wate123_Jun has joined #ste||ar
<Yorlik>
Yes - I'll look into it.
<Yorlik>
Gotta study the code better now.
<ibalampanis>
Yorlik Are you interested in gsoc?
<Yorlik>
I'm not a student
<ibalampanis>
Ok! :D
<ibalampanis>
:) *
<Yorlik>
WQe're a group of hobbyists writing a gameserver with HPX.
<ibalampanis>
Wow! After gsoc I would like to contribute on it
<Yorlik>
Sure - contact me. But we're not open source.
<Abhishek09>
zao nikunj Hi
<Yorlik>
But we use Lua !!
<ibalampanis>
@y
<ibalampanis>
Yorlik How come you are nοt?
<hkaiser>
Yorlik: I still hope to get a free license for your game, though ;-)
<Yorlik>
There are several reasons. It's a long winded discussion I'd avoid for today.
<ibalampanis>
Ok! Understood!
wate123_Jun has quit [Ping timeout: 256 seconds]
<Yorlik>
We do not really have commercial ambitions, we we might monetize just to cover the costs. E.g. an option could be to make a dual license later, but giving out server code gives a great advantage to hackers, so we'd do that rather late in the process if ever.
<Yorlik>
In the moment it's a msall puny exercise made mostly by me anyways.
<Yorlik>
With some grains of awesomeness ;)
<ibalampanis>
Sure, I haven't in my mind this about hacking
<ibalampanis>
Hah, you 're great!
<Yorlik>
MMO are always under massive attack from hackers.
<Yorlik>
LOL - No - just a little crazy :)
<ibalampanis>
Crazy is more suitable word than great. I admit it
<ibalampanis>
;)
<Yorlik>
You need to be a little crazy to do what we do.
<Yorlik>
When a hobbyist start saying: We want to write an MMO the reaction usually is always negative.
<ibalampanis>
Could you give me your email or a social media account in order not to spam here?
<Yorlik>
For reasons: It's quite a task. But with libraries like HPX and Lua it's much more feasible than it used to be.
<Yorlik>
Sure: mckillroy lives at gmail with the dot of com.
<ibalampanis>
hah thanks
<Yorlik>
:)
<ibalampanis>
I just send you an ack message Yorlik
<Yorlik>
It arrived
<ibalampanis>
It's ok!
Hashmi has joined #ste||ar
<ibalampanis>
hkaiser Are you in USA? I ask this to have in my mind the local time difference
<hkaiser>
yes, I'm in the US central time zone
<ibalampanis>
What about Bita? If you know.
<hkaiser>
same
<ibalampanis>
So, your local time is 9:04 (24hr)
<Yorlik>
hkaiser: Is the default executor essentially the interface description when writing an executor? Or is there a more overview writeup on the concept somewhere?
<Yorlik>
I see there is a bit in examples though.
wate123_Jun has joined #ste||ar
<ibalampanis>
hkaiser Is fault if I repo is a Jetbrains CLion project? This IDE has the Cmake as integrated tool and the execute of code is able via a button.
nikunj97 has joined #ste||ar
<Yorlik>
ibalampanis: Last time I looked at CLion, their CMake only supported make generation and not ninja or MSBuild. Not sure if it's still like that.
<ibalampanis>
I don't know what you say. What do you mean generation?
<Yorlik>
CMake is not a build system, but a generator for various build systems, like make, MSBuild or Ninja.
<Yorlik>
IIRC CLIon does not support all of them, but just makefile generation.
<Yorlik>
But that might be outdated info. But if it still is true and you want something else with CMake than makefiles you might run into issues with CLion.
<ibalampanis>
Yeah, I understand now. I don't know if now supports something else than make
<Yorlik>
I just thought I'd tell you for consideration.
* Yorlik
is just a hobbyist with same half baked semi-knowledge which ~just works enough :)
Pranavug has quit [Client Quit]
<ibalampanis>
Sorry if I made a fault question
<Yorlik>
I think it's a good question. You wanna use the tools that work for your project.
<ibalampanis>
It's true
<ibalampanis>
Yorlik: Which tools/IDEs do you suggest?
<Yorlik>
I can only say what I use - but what's best for you might be totally different. I work mostly on Windows and am using MSVC Community edition. On Linux I use VSCode
<ibalampanis>
Ok, VSCode is a safe pick. Works with everything!
<Yorlik>
It's a good editor. And it works nicely with CMake and testing frameworks.
<Yorlik>
You can also use intellisense with it
<ibalampanis>
Yeah, have it in my mind
<ibalampanis>
Thanks!
<Yorlik>
VSCode also allows you to work remotely via SSH
<ibalampanis>
Yeah I knew it! Thanks for your time! Time to go out for duties! Cheers!
<Yorlik>
It has a remote server - so that'S a pretty slick solution, though I heared you should't use it over the internet because of security issues. But locally its a nice thing.
ibalampanis has quit [Remote host closed the connection]
shahrzad has joined #ste||ar
<zao>
Yorlik: vscode’s ssh remote now only listens on localhost on the remote, so it’s safe on singleuser machines or where you trust the other users
<Yorlik>
Ah OK - so they fixed it. Thanks for the heads-up.
<zao>
I have yet to get any response on whether it has any token auth or not, idiots seem to be reluctant to just say.
<Yorlik>
Instrumentation will soon be a thing for us, when we have Milestone 1 and go into a polishing pahse.
<Yorlik>
jbjnr: I just stole three lines of code from your limiting executor and put them into the start hook of the executor I'm using - I already had task counting implemented. And it seems to work nicely.
<Yorlik>
Essentially I needed just this:
<Yorlik>
if ( ( ++task_counter<I> ) > upper_threshold_ ) {
<nikunj97>
hkaiser, the only way to calculate grain size that I know of is putting up a high resolution clock and measure the task execution time. Is there any other way? I hear APEX is used for performance measurement in HPX. Would that help?
<hkaiser>
nikunj97: yes, that's what the auto-chunker does
<Yorlik>
hkaiser: I just put the above three-liner in the start() lambda of your hooked executor and it seems to work nicely
<hkaiser>
right
<hkaiser>
as expected
<Yorlik>
Working on jemalloc now :) Can't wait to see the effects.
<hkaiser>
shahrzad: you need to use Boost.Context on the Pi
<Yorlik>
A quick and dirty exerpt from my use of the auto chunker
<hkaiser>
nikunj97: there is also a new website (just created a while back): hpx.stellar-group.org
<hkaiser>
we will use it for all things HPX in the future
<nikunj97>
Yorlik, thanks!
<nikunj97>
hkaiser, aah the one you were talking about the other day
<nikunj97>
I'll go through the blogs. Thanks
<Yorlik>
nikunj: Made a cleanup of the gist - it's shorter now amd more clear.
<hkaiser>
nikunj97: I'd be more than happy to give you access if you'd like to write blog posts
<nikunj97>
hkaiser, I'm making myself familiar with the functionalities that I've rarely/never used
<nikunj97>
I am sure to right one post it. HPX needs more publicity and user base
<hkaiser>
perfect opportunity to document things as you go
<hkaiser>
;-)
<nikunj97>
hkaiser, yes, I'm already writing smallest code snippet for the things I'm learning
<hkaiser>
blog posts can be short, no reason to write a novel
<nikunj97>
I believe it's always better to start with the easiest code you can write. And then you can show it's usage in a real application
<hkaiser>
simple snippets are enough
<nikunj97>
I'll be writing these blogs over the summer
<nikunj97>
coz I don't think my internship is happening lol
<hkaiser>
nikunj97: yah, there is that
wate123_Jun has quit [Remote host closed the connection]
wate123_Jun has joined #ste||ar
wate123_Jun has quit [Ping timeout: 256 seconds]
Abhishek09 has quit [Remote host closed the connection]
shahrzad has quit [Ping timeout: 240 seconds]
wate123_Jun has joined #ste||ar
<Yorlik>
hkaiser: since HPX is using jemalloc - would I have to link my app again against jemalloc, or can I just use the symbols and only include the header where needed?
wate123_Jun has quit [Ping timeout: 240 seconds]
Hashmi has quit [Quit: Connection closed for inactivity]
nikunj97 has quit [Read error: Connection reset by peer]
nikunj97 has joined #ste||ar
wate123_Jun has joined #ste||ar
shahrzad has joined #ste||ar
ibalampanis has joined #ste||ar
<hkaiser>
Yorlik: if you reference the symbols from your code you need to link against it, HPX might however re-export the library so that this might not require any actions on your side
<hkaiser>
I htink the allocator is target_link_library'd to hpx publicly
<hkaiser>
so you should be fine
<Yorlik>
It's just working - just started a test :)
<hkaiser>
ok
<Yorlik>
Because of the task limiter I now simply let the lua states explode as needed, but I delete above a threshold on return to the pool
<Yorlik>
It looks my test runs ~40-60% faster - I'll let it run a bit
<Yorlik>
:D
<Yorlik>
Big Win
<Yorlik>
From ~40000 messages /sec to ~60000
<Yorlik>
With 10000 calls into Lua
<Yorlik>
100 messages per call
<hkaiser>
good
<Yorlik>
That's muich more we ever had with the old crappy project.
<hkaiser>
:D
<hkaiser>
HPX for the win!
shahrzad has quit [Ping timeout: 240 seconds]
<Yorlik>
I think I can now focus on a demo game and finalizing the default events and stuff.
<hkaiser>
that's on a 4 core machine?
<Yorlik>
Yes
<Yorlik>
4790k
<Yorlik>
So - on a decent modern server this would surely b much better
<Yorlik>
OFC, the Lua Load will be more.
<Yorlik>
Still I'm happy with this result for now.
<hkaiser>
Yorlik: we should give you access to our test cluster there you could do better benchmarks
<Yorlik>
hat would be cool
<Yorlik>
But I need to go finish the milestone first
<Yorlik>
I want a basic fox-rabbit-grass population dynamics testcase
<hkaiser>
Yorlik: talk to akheir here (once he's back), he manages the cluster
<Yorlik>
And we are n ot yet distributed
<Yorlik>
I need to get the location and load balancid system done for that
<Yorlik>
The spatial partitioning
<hkaiser>
no idea what a fox-rabbit-grass population is ;-)
<zao>
A single node of like 14-28 cores still tends to give some insights, particularly around NUMA junk.
<Yorlik>
Simple
<Yorlik>
Grass Grows
<Yorlik>
Rabbits eat grass
<Yorlik>
Foxes eat rabbits
<Yorlik>
Chaotic numbers in the subpopulations
<hkaiser>
ok
<Yorlik>
Its an easy way to create man objects.
<Yorlik>
I'll make a very simplistic AI for that
<ibalampanis>
@hk
<ibalampanis>
hkaiser: Do you know if Bita is in dayoff today?
<hkaiser>
Yorlik: interesting
<hkaiser>
ibalampanis: it's weekend
<ibalampanis>
Such a good note! Thanks!
<ibalampanis>
hkaiser: Why aren't you in dayoff because of weekend?
<hkaiser>
ibalampanis: I'm just lurking here ;-)
<ibalampanis>
Hahah it's ok!
<ibalampanis>
In order to double check, is your local time 12:35 ? (24hr)
<hkaiser>
Yorlik: I merged #4462 just now
<hkaiser>
ibalampanis: yes
<Yorlik>
OK. stable it is then :)
<ibalampanis>
Thanks!
<zao>
ibalampanis: Local time in New Orleans should be accurate.
<Yorlik>
I'll wait until the tag is there.
<hkaiser>
Yorlik: you need to wait for the CI to cycle for stable to be updated to this
<Yorlik>
Yup
<Yorlik>
I have a working build after all.
<ibalampanis>
zao: Thank you
<Yorlik>
I think I'll use commit hashes in the future, so I have pinned stables
<hkaiser>
Yorlik: I'd prefer you using the stable tag - nice way for us to discover problems ;-)
* Yorlik
just joined the CI union
<nikunj97>
zao, is a swap storage really important?
<zao>
nikunj97: How elaborate of an explanation do you want? :)
<nikunj97>
I've got 16gb ram
<nikunj97>
and I'm always on 12-13gb
<nikunj97>
it's when I'm not running any compilation/linking stuff
<nikunj97>
it's with chrome, vs code, hexchat, spotify and a few other applications open
<nikunj97>
like discord or slack
<zao>
So your OS has a virtual memory system, yeah? Memory is split up into pages (4 KiB typically) and is backed either by physical RAM, physical storage, or just requested by not committed yet, having no backing storage.
<zao>
Memory for things like loaded libraries or memory mapped files are backed by files on your disk, and can as long as it's not modified be dropped from memory and re-read from the files when needed.
K-ballo has quit [Remote host closed the connection]
K-ballo has joined #ste||ar
<zao>
Memory for data allocated from a process or modified pages cannot be spilled to disk if there's no backing storage, it's stuck in RAM.
<zao>
That is, unless you have a swap file/partition. It serves as an off-load area for less used pages from RAM, which the OS attempts to have somewhat pre-populated in case there's a burst in memory usage and it needs to evict something.
<zao>
It helps free up and compact the contents of physical RAM that may have been spuriously loaded or not used at all.
<nikunj97>
I see
<nikunj97>
I should add one. Should help relieve the stress on memory
shahrzad has joined #ste||ar
<hkaiser>
Yorlik: I approve
<Yorlik>
hkaiser: Erm .. what?
<hkaiser>
Yorlik: you joining the CI union ;-)
<Yorlik>
:) lol
<Yorlik>
They told me to take a nap to keep my overwhelming beauty - which I will do now - BBL :)
<nikunj97>
heller1, why are my stream results looking way too different than the other day executing the same binary?
<nikunj97>
I'm getting about 70-80GB/s bandwidth on x86 node all of a sudden
<hkaiser>
nikunj97: different phase of moon ;-)
Abhishek09 has joined #ste||ar
<nikunj97>
lol, it can make such a difference?
<hkaiser>
sure!
<hkaiser>
din't you know?
<nikunj97>
it doubled!
<nikunj97>
not like 10% or 20%
<hkaiser>
so you changed something
<nikunj97>
all my calculations are no good atm with these new results
<nikunj97>
I didn't change anything. I executed an already compiled file that I used previously to record exactly the same thing
<hkaiser>
yaya
<hkaiser>
something else was going on on your machine when you tried before?
<nikunj97>
it was executed on a node allocated by slurm
<hkaiser>
rostam?
<nikunj97>
no the cluster is at jsc
<hkaiser>
exactly the same node?
<nikunj97>
not sure about that
<hkaiser>
see
<nikunj97>
but they're all same x86
<hkaiser>
you had that effect before when you were with us, remember?
<hkaiser>
*sure*
wate123__ has joined #ste||ar
<nikunj97>
yes, I remember the effect
<hkaiser>
even on rostam equivalent the nodes are different
<nikunj97>
so I choose a single node to record and benchmark everything
<nikunj97>
btw which one will have higher memory bandwidth, float or double?
<nikunj97>
peak memory bandwidth i.e.
wate123_Jun has quit [Ping timeout: 240 seconds]
<heller1>
The data type is irrelevant
<nikunj97>
heller1, 60GB/s with float and 80GB/s
<nikunj97>
with double
<nikunj97>
that's why I was wondering what gives
<nikunj97>
on arm
<heller1>
Well, you might need to increase the array size
<heller1>
Float is half the size, might mess up the measurements
<nikunj97>
aah that makes sense
<heller1>
Also, x86 != x86
<nikunj97>
I meant they're all Xeon E5 2660 v3
<nikunj97>
same frequency as well
<nikunj97>
and now arm is only giving 20GB/s for some reason
<nikunj97>
it's exactly the same node but giving way less memory bandwidth. What am I doing wrong?
ibalampanis has quit [Remote host closed the connection]
<nikunj97>
heller1, just read the array size rule on stream. It's definitely array size issue. Arm has 64MB cache while the array size is 10M
<nikunj97>
L3 cache ^^
<heller1>
See
<nikunj97>
I should read things more carefully :/
<nikunj97>
I changed the array size to 128M so I should see consistent rates both on hisilicon and e5
<nikunj97>
heller1, yup, very consistent now. I don't see difference in float and double as well
nk__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 240 seconds]
weilewei has joined #ste||ar
shahrzad has quit [Ping timeout: 256 seconds]
<weilewei>
May I ask, what is the status of concurrent data structure support in GSoC project? I am interested in participating as a mentor (and learning as well)
<weilewei>
Will we aim at implement concurrent_unordered_set
wate123__ has quit [Remote host closed the connection]
wate123_Jun has joined #ste||ar
wate123_Jun has quit [Ping timeout: 256 seconds]
nk__ has quit [Read error: Connection reset by peer]
nikunj97 has joined #ste||ar
<nikunj97>
Abhishek09, hey
<nikunj97>
just saw your text from afternoon
<nikunj97>
what is it that you want to talk about?
<Abhishek09>
nikunj97: Does installation of phylanx require library files for perfect working or it is fine with devel and header files?
<zao>
Abhishek09: Hi there, did you want anything particular this morning? I saw you highlighted me but didn't say what it was about.
<zao>
(the above, I guess)
<Abhishek09>
zao: i have lost that thing
<nikunj97>
Abhishek09, I didn't get you
<zao>
If you're talking about which one of `hpx` and `hpx-devel` you need from the OS, an answer is that `hpx-devel` depends on `hpx`.
<zao>
So you're going to either have `hpx` or `hpx` + `hpx-devel` installed if you're working with OS packages.
<zao>
This is customary.
<zao>
A development package contains an additional set of files on top of the base set of files in the regular package.
<zao>
Both packages are needed to form a development environment.
<zao>
Is this what you wondered about? :D
<Abhishek09>
zao: Yes , that means hpx and hpx devel both are madatory for phylanx
<Abhishek09>
installation
<Abhishek09>
?
<Abhishek09>
hpx devel depends on hpx
<zao>
You could reason it out from what you know that the Phylanx build needs too.
<zao>
It builds a Python package with a native extension.
<zao>
The native extension when built needs the headers to compile and the library to link.
<Abhishek09>
As i have build phylanx by installing hpx by source not dnf
<zao>
The only case you can get away with not having libraries installed is when you have a dependency that doesn't _have_ libraries, for example header-only dependencies like Eigen and Blaze
<zao>
Great.
<Abhishek09>
That means i have to ensure that all deps must works fine(library+built files+header) before building phylanx zao
<zao>
I've found in the past that as long as the install step has worked, the dependencies are usable.
<nikunj97>
hkaiser, so auto_chunk_size essentially makes sure that parallel_for_loop creates tasks such that their grain size is the time specified to auto_chunk_size?
<nikunj97>
"This executor parameters type makes sure that as many loop iterations are combined as necessary to run for the amount of time specified." - Just making sure if I got this right
<Abhishek09>
zao: Soon i will draft a proposal . i will built entirely (lib+header+built files) by source , Any tips for me you want to give
<Abhishek09>
nikunj97 ^
<nikunj97>
nope, looks good
<nikunj97>
you'll have to build most of things from source
<zao>
Abhishek09: I think I've said this in the past, but make a habit of installing things into a non-system directory, so that it's easy to remove and control.
<zao>
Also take good notes on what commands you run so it's reproducible.
wate123_Jun has joined #ste||ar
<nikunj97>
Yorlik, ^^
<Yorlik>
Ya?
<nikunj97>
about the auto_chunk_size thing
<Abhishek09>
zao: You also patrticipating this year As a student in GSoC?
<nikunj97>
I said before, is it correct?
<zao>
Abhishek09: Nope, I've never done GSoC. I'm a professional sysadmin and application expert at a HPC site.
<zao>
My primary job is to build and install software for researchers :D
<Yorlik>
nikunj: The auto chunker does measurements. I think you pay like 10% for these IIRC what hkaiser said. He knows better.
<nikunj97>
aah so I invoke get_chunk_size to know what the grain size was?
<Yorlik>
I never used that function, but probably yes
<nikunj97>
wait if auto chunker does measurements, there should be a way to report it back right?
<nikunj97>
I'm asking that
<nikunj97>
so you use executor.with( auto_chunk_size( 2000us ) ), what does it actually do?
<Yorlik>
The auto chinker attempts to size you chunks such, that they run 2000us
<Yorlik>
So if one iteration take 1us it would do 2000 iterations per chunk
<nikunj97>
but with 10% overheads
<Yorlik>
IIRC yes. hkaiser should tell exactly.
<nikunj97>
ok no, this isn't what I wanted. I want to time the grain size
<Yorlik>
You can use an exe3cutor, which has a start and a stop function
<nikunj97>
not make the runtime change iterations to make it that grain size
<Yorlik>
So you can pout your measurement hooks there
<Yorlik>
You can put whatever you want into the start and stop lambdas
<Yorlik>
You put the executor in the parloop and give it the start and stop lambdas where you can place your measurement or perfcounters.
<Yorlik>
I use it to limit task creation to a max
<Yorlik>
It gives you control over a chunk.
<Yorlik>
And in the loop you can use the auto chunker or a static chunker
<nikunj97>
got it
<nikunj97>
it's again not what I was looking for btw. But it's a nice idea
<Yorlik>
Maybe I misunderstood - what exactly do you need?
<nikunj97>
so basically I want to measure the time of my execution of a task. I previously used to use high resolution timers to measure the grain size
<Yorlik>
Do you want to measure your chunks or do you want to size them?
<nikunj97>
like start the timer on invoking the function and calculate the time elapsed in the end
<nikunj97>
I want a better way to get the time
<nikunj97>
I neither want to measure them nor change the size
<Yorlik>
Just measure a chunk and divide by the number?
<nikunj97>
but that won't give me a right measure coz they have 10% overheads
<Yorlik>
Not the static chunker
<Yorlik>
Onlky the auto chunker
<Yorlik>
The static chunker has zero overhead
<Yorlik>
(Or almost zero)
<nikunj97>
static chunker simply measures the grain size?
<Yorlik>
No
<nikunj97>
or does it try to do something similar to auto chunker?
<Yorlik>
You set the size of a grain
<Yorlik>
Its fixed
<nikunj97>
aah got it, your snippet
<nikunj97>
simply tell the number of iterations you want
<Yorlik>
Do you want auto chunking or not?
<Yorlik>
Yes
<nikunj97>
nope, I do not want auto chunker
<nikunj97>
static chunker looks better
<Yorlik>
The use the static one. Its cheap
<nikunj97>
I do not want the runtime to optimize things, I'm optimizing according to an underlying hardware ;-)
<Yorlik>
So you make measurements and then you know your grain size?
<Yorlik>
But you do it only once?
<Yorlik>
You could simply do a claibration phase then, before the real heavy duty job starts.
<Yorlik>
As long as your item load is constant that should work nicely.
<nikunj97>
my item load is constant
<nikunj97>
I just want to keep the load at optimal
<Yorlik>
So basically you just want to measure your hardware.
<nikunj97>
that's why I wanted to measure the grain size
<Yorlik>
And adjust to it
<nikunj97>
Yorlik, precisely
<Yorlik>
static chunker then
<nikunj97>
got it!
<Yorlik>
Or just run a shorter loop
<nikunj97>
what do you mean?
<Yorlik>
Like a short calibration loop and then adjhust the chunks
<Yorlik>
But maybe its better to have the chunkin in - then you have measured all overheads.
<nikunj97>
yea
<Yorlik>
But thats a detail question.
<Yorlik>
Good luck!
<nikunj97>
I think I got it, let me try
<nikunj97>
thanks for the help!
<Yorlik>
Cheers! :)
<nikunj97>
Yorlik, one more thing
<Yorlik>
Ya?
<hkaiser>
nikunj97: auto-chunker is your friend
<nikunj97>
so if a parallel_for loop goes from: 0 100, and you set static_chunk_size=10, you'll have 10 tasks in total, right?
<Yorlik>
hkaiser: how large is its overhead again?
<nikunj97>
hkaiser, aren't the overheads high though?
<hkaiser>
you can do both: measure the best chunksize and then set it (possibly using the static chunker
<hkaiser>
you can also create your own chunker that measures things once and uses the settings for all subsequent uses or somesuch
<hkaiser>
Yorlik: overheads of what?
<Yorlik>
hkaiser: What's the cost of the autochunker again?
<hkaiser>
you tell it
<hkaiser>
by default it runs 1% of the iterations to measure
<hkaiser>
but you can change that
<Yorlik>
Oh I C
<nikunj97>
1% aint't bad
<Yorlik>
I wasn't clear about that, just told nikunj97 about it.
<nikunj97>
I can use auto chunker then
<hkaiser>
nikunj: could be too much
<Yorlik>
1% is 6 minutes in 10 hours :)
<hkaiser>
Yorlik: you want to measure it once every now and then and just reuse afterwards
<hkaiser>
writing a chunker is trivial, just look at the existing ones
<Yorlik>
I will need constant monitoring, since the workloads always change
<Yorlik>
I'm happy to use the autochunker
<hkaiser>
right, that's what I said
<hkaiser>
ok
<Yorlik>
nikunj has a different situation, He just wants to measure his hardware to determine the grainsize for a constant job
<Yorlik>
Mine is horribly dynamic.
<nikunj97>
Yorlik, setting auto chunker to find the best grain-size will also do the trick for me
<nikunj97>
once I find the best, I'll get rid of the auto chunker to get another 1% speedup
<Yorlik>
With 1% it's much smaller than I wrongly remembered
<nikunj97>
It's just that I wanted to know a good way where I don't have to time loops and write scripts to know better
<nikunj97>
this way I can just run the thing on the chunk size I want
<Yorlik>
What kind of job are you running?
<nikunj97>
it's a stencil benchmark
<Yorlik>
OK
<nikunj97>
I want to optimize it
<Yorlik>
IC
<nikunj97>
to the best I can. If I can show that HPX's functionality hide in the noise and I get near optimal performance, I can write a paper on it
<nikunj97>
which will increase HPX's publicity and help me with my academic journey :D
<Yorlik>
Cool :) HPX is a really good piece of tech. You'll have fun :)
<nikunj97>
hkaiser, btw Sanjay informally offered my a PhD during the interview
<nikunj97>
he was telling me all the PhD deadlines and wanted to know what I wanted to pursue
<nikunj97>
*offered me a PhD
<hkaiser>
nikunj97: nice
wate123_Jun has quit [Ping timeout: 240 seconds]
karame_ has quit [Remote host closed the connection]