hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
nikunj97 has quit [Remote host closed the connection]
hkaiser has joined #ste||ar
<heller__>
hkaiser: finally got around to analyze my profiles
<hkaiser>
ok
<heller__>
the main reasons for the regression seems to be 1) Clearing/Allocating a page 2) TLS access
<heller__>
which is interesting
<hkaiser>
I suspected 1) to be a reason
<heller__>
and of course, there's some significant overhead for small allocations for the header\
<hkaiser>
I didn't expect the tls to get in the way
<heller__>
I am not looking into it
<heller__>
it just shows up in the profile
<hkaiser>
any allocator has 8 bytes of overhead per allocation, I'm sure of that
<hkaiser>
alternatively one would have to linearly search for the page the allocation came from
<heller__>
right ...
<heller__>
or do the it hare trick
<hkaiser>
I don't like that
<hkaiser>
but sure, if it helps...
<hkaiser>
I don't have to like it
<heller__>
I'll keep digging
<hkaiser>
heller__: I will disentangle the allocator related changes from everything else in this PR
<hkaiser>
the unrelated changes are valueable on their own
<heller__>
I agree
<heller__>
hkaiser: as suspected, for the multithreaded tests, we dwelve in scheduling overheads :/
<hkaiser>
heller__: sure, too little work
<heller__>
hkaiser: another problem with the test is that the numbers are barely comparable since the operate on different values ;)
<hkaiser>
well, fix it ;-)
<hkaiser>
it was meant to be a test to begin with, not a performance test
<heller__>
sure
mcopik has quit [Ping timeout: 260 seconds]
nikunj97 has joined #ste||ar
<heller__>
hkaiser: which test do I want to run for phylanx?
mcopik has joined #ste||ar
<hkaiser>
the simple_loop one
<hkaiser>
tests/performance/...
eschnett has joined #ste||ar
hkaiser has quit [Quit: bye]
bibek has joined #ste||ar
aserio has joined #ste||ar
akheir has quit [Quit: Leaving]
akheir has joined #ste||ar
hkaiser has joined #ste||ar
V|r has quit [Ping timeout: 240 seconds]
jaafar has joined #ste||ar
RostamLog has joined #ste||ar
aserio has quit [Ping timeout: 240 seconds]
patgr has joined #ste||ar
<patgr>
Hello
<patgr>
heller__: hkaiser jbjnr__ Do you remember why we used GNU instead of intel for the GB runs
<hkaiser>
patgr: gcc was generating much faster code
<patgr>
hkaiser: was that the only reason, I thought there were some incompatibilities at that time
<hkaiser>
not as far as I remember
aserio has joined #ste||ar
david_pfander has quit [Ping timeout: 246 seconds]
aserio has quit [Ping timeout: 240 seconds]
<heller__>
patgr: yes, the Intel compiler is able to compile most
<heller__>
hkaiser: alright. Will have a look how that behaves for me
aserio has joined #ste||ar
aserio has quit [Ping timeout: 260 seconds]
nikunj97 has quit [Remote host closed the connection]
nikunj97 has joined #ste||ar
aserio has joined #ste||ar
khuck has joined #ste||ar
<khuck>
heller__: I still can't build the thread_local_allocator branch of phylanx. HPX built, but phylanx won't.
mcopik has quit [Ping timeout: 252 seconds]
khuck has quit [Remote host closed the connection]
<zao>
I ran into some shady behaviour of our tests with our intel/2018b toolchain... I wonder if it's not quite compatible with GCC 7.3.0 in C++17 mode.