hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoD: https://developers.google.com/season-of-docs/
<hkaiser> heller: TSS is still failing for the execution_context branch4
<hkaiser> heller: master is now failing with the same TSS failures, btw
diehlpk has joined #ste||ar
diehlpk has quit [Remote host closed the connection]
hkaiser has quit [Quit: bye]
K-ballo has quit [Quit: K-ballo]
Coldblackice has quit [Ping timeout: 268 seconds]
Coldblackice has joined #ste||ar
Coldblackice has quit [Ping timeout: 276 seconds]
Coldblackice has joined #ste||ar
Coldblackice_ has joined #ste||ar
Coldblackice has quit [Ping timeout: 240 seconds]
jaafar_ has joined #ste||ar
jbjnr_ has joined #ste||ar
jaafar_ has quit [Ping timeout: 268 seconds]
Coldblackice_ has quit [Ping timeout: 265 seconds]
Coldblackice has joined #ste||ar
Coldblackice has quit [Ping timeout: 264 seconds]
Coldblackice has joined #ste||ar
K-ballo has joined #ste||ar
hkaiser has joined #ste||ar
<heller> Oh no
<heller> hkaiser: saw it. Accidentally pushed to master, sorry
<heller> You were right, the tss reset needed to be before the result binding
<hkaiser> heller: let's see if this fixes things now
<hkaiser> and thanks for looking into it
<heller> I tested it locally at least
<heller> Should have been more careful
<hkaiser> heller: how did the talk go?
<hkaiser> is it available somewhere?
<heller> I think it went well
<heller> It will be uploaded next week, I think
<heller> It was recorded
<heller> We might have hit a never there
<heller> Nerve
<heller> Adapting everything to p0443 will be time consuming though
<hkaiser> absolutely
<hkaiser> this involves redoing all of our executor stuff
<heller> Yes
<hkaiser> possibly even the future implementation
<heller> Yeah
<hkaiser> that's for after the refactoring
<heller> Having something like that it there quickly is important
<heller> There are quite a few use cases which could benefit from such an infrastructure
<heller> And people would like to use it if it were there
<heller> So we need to find a middle ground between getting something out, keeping the API stable, and start the migration to P0443
<hkaiser> sure, needs to be in a coordinated way and step by step
<heller> We can probably ignore the sender/receiver part and start with executor
<hkaiser> hmmm
<heller> And keep our executors implementation
<heller> At first
<hkaiser> I think the sender/receiver part is the lowest level everything else is built upon
<heller> Right, which also means that our stuff needs to be converted as well
<hkaiser> indeed
<heller> The biggest appeal is to just use or future based infrastructure
<heller> Having to reimplement that based on the sender receiver stuff will take some time
<hkaiser> it definitely will take time
<heller> Getting the execution context stuff to run with our current executors interface is less intrusive
<heller> And I hope that this transition will make it easier to port to the new design
<hkaiser> ok
<heller> The risk is that we might need to break the API
<hkaiser> I'd like to keep the variadic API
<heller> For the executor dispatch?
<hkaiser> I think they're making a mistake without it
<heller> Ok, let's get some implementation experience there...
<heller> And usage experience
<hkaiser> right
<hkaiser> heller: the stackfull threads on the stackless branch are now on par with master, btw
<hkaiser> stackless is still ~7%% faster
<heller> Cool, what did you do?
<hkaiser> remove the virtual function for the operator()()
<heller> 7% is still not a lot. Where did it come from?
<hkaiser> less code executed, this corresponds to about 100ns per thread
<heller> So it's just the context switch
<hkaiser> and the stack allocation
<hkaiser> and related costs
<heller> The stack allocation costs should have been mitigated
<hkaiser> it's a start, even if it's not much
<heller> As we reuse them
<hkaiser> sure
<heller> Get rid of the atomic state change
<heller> That should do wonders
<hkaiser> will introduce vistual functions again
<hkaiser> virtual
<hkaiser> but it's worth a try
<heller> Yes
<heller> But those should be insignificant
<hkaiser> the virtuala functions themselves are insignificant, it's the optimization barrier introduced by using them that causes slowdown
<heller> What does the profiler say where the costs currently are?
<hkaiser> no real hotspot
<heller> Can you show the top 10?
<hkaiser> don't have anything right now I could show
<hkaiser> I can reproduce it for you - it's mostly the thread stealing, as always
<hkaiser> I'll try the non-atomic state, but not today
<heller> Sure
<heller> I'll enjoy the train ride home as well
<heller> We should talk some time next week
<hkaiser> ok
<zao> A small heads up btw, the stellar-group website is mixed-content thanks to the group logo being fetched over HTTP regardless of the protocol used to access the page.
<hkaiser> zao: ok, I'll have a look
<hkaiser> easy enough to fix
<zao> I've never used any other allocator than `system`. Which one ought I pick for packaging if I can get both jemalloc and tcmalloc?
<hkaiser> zao: both are fine
<zao> Alright. Making an EasyBuild config and had to flip a coin to pick a malloc impl.
jaafar_ has joined #ste||ar
jaafar_ has quit [Quit: Konversation terminated!]
jaafar has joined #ste||ar
Coldblackice has quit [Ping timeout: 268 seconds]
Coldblackice has joined #ste||ar
Coldblackice has quit [Ping timeout: 240 seconds]
Coldblackice has joined #ste||ar
hkaiser has quit [Ping timeout: 245 seconds]
hkaiser has joined #ste||ar
Coldblackice has quit [Ping timeout: 240 seconds]
Coldblackice has joined #ste||ar
hkaiser has quit [Ping timeout: 245 seconds]
jbjnr_ has quit [Ping timeout: 245 seconds]
hkaiser has joined #ste||ar
weilewei has joined #ste||ar