hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020
nan11 has quit [Remote host closed the connection]
nikunj97 has joined #ste||ar
Nikunj__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 265 seconds]
<Yorlik> hkaiser: You're right - It was a bug in my build system not updating to the latest stable. Sorry for that.
<hkaiser> Yorlik: no worries, glad that's resolved
hkaiser has quit [Quit: bye]
<Yorlik> I still do not understand why it didn't update the git. Have tomanually delete stuff or manually update in the moment.
<Yorlik> Maybe it has problems with the wandering stable tag
<Yorlik> Maybe I just need to add a manual tag update or sth.
akheir has quit [Quit: Leaving]
Nikunj__ has quit [Read error: Connection reset by peer]
<simbergm> tiago.fg: I'd say 100% cpu usage is a bug if the only thing you're doing is waiting for a future
<simbergm> the key here would be to check if HPX_WITH_THREAD_MANAGER_IDLE_BACKOFF is on or off
<simbergm> it should be on by default, and should lead to close to 0% utilization when no tasks are running
<simbergm> it's a cmake variable btw
<Yorlik> How would I start a local member function as hpx task correctly?This is obviously wrong: auto frameless_fut = hpx::async( &update_frameless, this );
<Yorlik> It's not an action - just a local member function
Yorlik_ has joined #ste||ar
Yorlik has quit [Disconnected by services]
Yorlik_ has quit [Client Quit]
Yorlik has joined #ste||ar
<Yorlik> Had to fall back to: auto frameless_fut = hpx::async( [&](){ return this->update_frameless(); }); No way it accepted auto frameless_fut = hpx::async( &update_frameless );
Nikunj__ has joined #ste||ar
nikunj97 has joined #ste||ar
Nikunj__ has quit [Ping timeout: 246 seconds]
Nikunj__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 264 seconds]
<heller1> Yorlik: &class::update_frameless
hkaiser has joined #ste||ar
hkaiser has quit [Ping timeout: 240 seconds]
<Yorlik> heller1: Even when called inside the class from another member function?
<heller1> Yes
<Yorlik> IC - that explains it - but I'm getting an error even for this: auto frameless_fut = hpx::async( &controller::update_frameless );
<Yorlik> C2672: 'hpx::async': no matching overloaded function found and error C2893: Failed to specialize function template 'unknown-type hpx::async(F &&,Ts &&...)
<Yorlik> and ... hpx/async_base/async.hpp(23): note: see declaration of 'hpx::async'
<Yorlik> No other errors before that in the compile output
<heller1> You need to pass the this pointer as well
<Yorlik> OK
<Yorlik> And it's happy :) Thanks a lot !
nikunj97 has joined #ste||ar
<Yorlik> I find it a bit confusing - sometimes you need to & operator (for methods) sometimes not (free functions) and the add this for methofs, but I have seen C++ libraries not requiring it in certain situations. Is there something I could read to get a bit more solid ground under my feet here?
<Yorlik> I also don't understand why I need the class::, even from a call inside the class, where the member is known.
Nikunj__ has quit [Ping timeout: 256 seconds]
nikunj97 has quit [Remote host closed the connection]
nikunj97 has joined #ste||ar
Nikunj__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 265 seconds]
<heller1> as a rule of thumb: always qualify to the maximum
<heller1> the class name is a qualifier as well
<heller1> and if you deal with functions, always use the address of operator
<heller1> the thing with functions is very strange... I always forget the rules as well ... there are function types and pointers to those, function types automatically decay to function pointers in most circumstance, so they are mostly equivalent, MSVC is a bit picky there, so adding the & is almost always a safe bet
<K-ballo> a member pointer can only be formed via a qualified name, so if it's a non static member you are taking the address of you need to put the class name in there even if from within the class
hkaiser has joined #ste||ar
<Nikunj__> hkaiser, yt?
Nikunj__ is now known as nikunj97
<hkaiser> here
kale_ has joined #ste||ar
<hkaiser> nikunj97: ^^
<nikunj97> hkaiser, there's something wrong with the block allocator when allocating floats on thunderX2
<nikunj97> it allocates them such that the access is much much slower
<hkaiser> why should do something wrong for a particular type?
<hkaiser> alignment?
<nikunj97> I tested a normal vector and and hpx::compute::vector without specifying it
<nikunj97> most likely alignment problems
<nikunj97> coz simd floats/doubles and doubles were perfectly fine
<nikunj97> and the performance loss is significant, more like a 2-2.5 times slow down
<nikunj97> and I'm sure it's with block allocator coz otherwise things are running nicely
<hkaiser> nikunj97: ok, so the block-allocator needs alignment support
<hkaiser> at least on that platform
<nikunj97> hkaiser, yes precisely
<nikunj97> I thought I should bring it up with you as its an interesting revelation
<nikunj97> and the slow down is significant as well
<hkaiser> sure, expected for some platforms
<nikunj97> do you want me to work on it? I think I know what to do
<hkaiser> nikunj97: do you use some allocator library?
<nikunj97> hkaiser, no
<hkaiser> k
<nikunj97> just plain hpx allocators
kale_ has quit [Ping timeout: 272 seconds]
kale_ has joined #ste||ar
<hkaiser> nikunj97: I meant: do you use jemalloc or friends?
<nikunj97> ohh yeah
<nikunj97> I use jemalloc for memory allocation
<hkaiser> ok, normally the allocator knows about required alignment for types
<nikunj97> you think it's something else?
kale_ has quit [Ping timeout: 256 seconds]
<hkaiser> nikunj97: if vector works, then it's not jemalloc
<hkaiser> except if vector itself does aligned allocation
<nikunj97> hkaiser, vector works
<nikunj97> so does hpx::compute::vector
<nikunj97> without using the allocator i.e.
<nikunj97> if I use make of that allocator, then I a slowdown for floats
<nikunj97> I see a slowdowsn
<nikunj97> *slowdown
<hkaiser> nod, compute::vector definitely doesn't do about alignment, iirc
<hkaiser> so it must be jemalloc
<hkaiser> can you look at std::vector to see how they do allocations?
<nikunj97> hkaiser, sure
<tiagofg[m]> ms: yes the only thing I'm doing is waiting for a future from a channel, and yes the HPX_WITH_THREAD_MANAGER_IDLE_BACKOFF is on
<hkaiser> tiagofg[m]: could you sare a small example that reproduces your issue?
<hkaiser> *share*
<tiagofg[m]> yes sure
<hkaiser> thanks
<tiagofg[m]> may I put the code here? it's an a example with a few lines of code
Nikunj__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 250 seconds]
nikunj97 has joined #ste||ar
Nikunj__ has quit [Ping timeout: 246 seconds]
nikunj97 has quit [Read error: Connection reset by peer]
Nikunj__ has joined #ste||ar
nikunj97 has joined #ste||ar
Nikunj__ has quit [Ping timeout: 265 seconds]
hkaiser has quit [Ping timeout: 240 seconds]
hkaiser has joined #ste||ar
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
hkaiser has quit [Ping timeout: 252 seconds]
Nikunj__ has joined #ste||ar
nikunj97 has quit [Read error: Connection reset by peer]
nikunj97 has joined #ste||ar
Nikunj__ has quit [Ping timeout: 256 seconds]
hkaiser has joined #ste||ar
<nikunj97> hkaiser, `memset(_Unfancy(_First), static_cast<unsigned char>(_Val), static_cast<size_t>(_Count));` vector allocates using memset converting _Count which is allocator_traits::size_type to size_t and _Val which is allocator_traits::value_type to unsigned char
<nikunj97> essentially filling it in an aligned manner using 1 word size at a time
<nikunj97> _First is the address to start from btw
Nikunj__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 260 seconds]
nikunj97 has joined #ste||ar
Nikunj__ has quit [Ping timeout: 250 seconds]
<hkaiser> nikunj97: that's not the allocaton, that's the initialization
<nikunj97> hkaiser, what do you want to know exactly?
Nikunj__ has joined #ste||ar
<hkaiser> there should be a new T[] somewhere
<hkaiser> or T aligned*
nikunj97 has quit [Ping timeout: 250 seconds]
Nikunj__ has quit [Quit: Leaving]
nikunj97 has joined #ste||ar
<hkaiser> nikunj97: or some alignas - something
<nikunj97> hkaiser, trying to see it by debugging
<nikunj97> I couldn't see it anywhere in the vector header file
<nikunj97> in the header file, the constructor took the number of initializations that the user wants and used memset to initialize it
<nikunj97> in libc++ I see it using std::fill_n
<nikunj97> hkaiser, I don't see new T[] or T aligned* anywhere
<hkaiser> the std::fill is filling the allocated memory
<hkaiser> the alignment happens at allocation time
<nikunj97> aah, it's also creating storage
<nikunj97> I see _M_create_storage(__n)
<nikunj97> hkaiser, I think I've found you're asking for
<hkaiser> well, that doesn't explain why it aligns things for doubles, but not floats
<nikunj97> yea the implementation is not aligning for floats
<nikunj97> aligns only for data types with size more than default alignment
<hkaiser> what's the default alignment?
<nikunj97> 16UL in my case
<hkaiser> k
<nikunj97> let me check it for the thunderx2 in case it has a different value
<nikunj97> hkaiser, 16UL is the default for 64bit systems
<nikunj97> so it's 16UL at x2 nodes as well
<hkaiser> ok
<hkaiser> no ideas here
<nikunj97> hkaiser, I'm clueless. In the end we make use of hwloc_alloc. Can there be a problem with that?
<hkaiser> I did ask you what is used for allocation, you said it's jemalloc
<hkaiser> well hwloc_alloc does not allign
<nikunj97> the block executor in HPX uses hwloc_alloc
<hkaiser> no, sorry
<hkaiser> it tries to allocate page-aligned memory from the OS.
<hkaiser> that makes even less sense
<hkaiser> right
<hkaiser> that should be well aligned
<nikunj97> then why is it that I'm getting performance for all other types
<nikunj97> except for floats?
<nikunj97> hkaiser, is there a test or benchmark or anything that can help me with finding why it lacks performance?
<nikunj97> std::allocator for perfectly fine but block_allocator does not
<hkaiser> std::allocator uses new, block_allocator uses hwloc_alloc
<hkaiser> and new uses std::malloc
<hkaiser> nikunj97: try adding an assert that makes sure the value returned from hwloc_alloc is sufficiently aligned
<nikunj97> hkaiser, how do I check for sufficiently aligned?
<hkaiser> ask google
<nikunj97> hkaiser, gotcha :D
<hkaiser> nikunj: alignof(T) gives you the alignment for T
<hkaiser> divide the pointer value hwloc_alloc returns by that and assert that the remainder == 0 (or something similar)
<nikunj97> ok let me try it
detan has joined #ste||ar
<detan> Hiya! Does any one know if I can use libgeodecomp with a mesh and particles? How would load balancing work in that case and is there an example for such setup?
<nikunj97> hkaiser, they're exactly the same
<nikunj97> sizeof a hwloc_alloc for 2 float allocation is 2* align of float
<nikunj97> hkaiser, what could it be if not alignment?
<hkaiser> shrug
<hkaiser> detan: uhh, heller1 might be able to help
<nikunj97> hkaiser, sounds like I'll have to write hardware inconsistencies in my report then :/
bita has joined #ste||ar
nikunj97 has quit [Quit: Leaving]
<detan> hkaiser Thanks I will wait for his feedback.
K-ballo has quit [Remote host closed the connection]
K-ballo has joined #ste||ar