hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020
weilewei has joined #ste||ar
Amy1 has quit [Ping timeout: 246 seconds]
Amy1 has joined #ste||ar
Yorlik has quit [Ping timeout: 265 seconds]
hkaiser has quit [Quit: bye]
zao[m] has quit [*.net *.split]
richard[m]3 has quit [Ping timeout: 246 seconds]
diehlpk_mobile[m has quit [Ping timeout: 240 seconds]
mdiers[m] has quit [Ping timeout: 240 seconds]
smith[m]1 has quit [Ping timeout: 256 seconds]
gdaiss[m] has quit [Ping timeout: 260 seconds]
gonidelis[m] has quit [Ping timeout: 244 seconds]
Guest40323 has quit [Ping timeout: 244 seconds]
kordejong has quit [Ping timeout: 244 seconds]
rori has quit [Ping timeout: 256 seconds]
heller1 has quit [Ping timeout: 256 seconds]
jbjnr has quit [Ping timeout: 256 seconds]
noise[m] has quit [Ping timeout: 260 seconds]
ms[m] has quit [Ping timeout: 260 seconds]
Yorlik has joined #ste||ar
richard[m]3 has joined #ste||ar
Yorlik has quit [Ping timeout: 256 seconds]
smith[m]1 has joined #ste||ar
gonidelis[m] has joined #ste||ar
kordejong has joined #ste||ar
Guest40323 has joined #ste||ar
heller1 has joined #ste||ar
jbjnr has joined #ste||ar
rori has joined #ste||ar
mdiers[m] has joined #ste||ar
diehlpk_mobile[m has joined #ste||ar
gdaiss[m] has joined #ste||ar
ms[m] has joined #ste||ar
zao[m] has joined #ste||ar
noise[m] has joined #ste||ar
nikunj has quit [Ping timeout: 260 seconds]
nikunj has joined #ste||ar
nikunj has quit [Read error: Connection reset by peer]
nikunj has joined #ste||ar
hkaiser has joined #ste||ar
ibalampanis has joined #ste||ar
<ibalampanis> Good morning! I wish the best to all participants on GSoC! Have a nice opensource trip!
<ibalampanis> Have the best start!
ibalampanis has quit [Remote host closed the connection]
Yorlik has joined #ste||ar
nan111 has joined #ste||ar
diehlpk_work has joined #ste||ar
karame_ has joined #ste||ar
rtohid has joined #ste||ar
<gonidelis[m]> thnx ilias
<hkaiser> gonidelis[m]: happy coding! ;-)
bita_ has joined #ste||ar
<gonidelis[m]> Couldn't resisit: https://ibb.co/1Tc6s2D
<Yorlik> hkaiser: YT ?
akheir has joined #ste||ar
<hkaiser> Yorlik: here
<Yorlik> I think I finally solved my crashes.
<Yorlik> A big problem was the task migration inside my alloc function.
<hkaiser> <quote>I fixed the last bug</quoe> ;-)
<Yorlik> Just realized writing something thread safe can be even more tricky in a task environment, depending on the situation.
<hkaiser> indeed - everything is moving at the same time
<Yorlik> The good thing is, idle times went down to ~8% when I did large numbers of objects
<hkaiser> very nice
<Yorlik> The systemn gets more efficient, the more work it has
<Yorlik> Still tweaking.
<Yorlik> At some point I'd like to talk to you about the parameters in voice.
<Yorlik> It seems under certain circumstances work is accumulationg on certain threads
<Yorlik> And other threads seem underused.
<Yorlik> I can see that from the engine spares they have
<Yorlik> I guess it's an artifact coming from suboptimal numbers of the autochunker target time and a low workload
<Yorlik> amongst others.
<Yorlik> So - I believe tweaking is becoming an art form here.
<hkaiser> could be
<hkaiser> Yorlik: we need to find a way to make it runtime adaptive
<Yorlik> it wouldn't be an issue to make the autochunker target time a dynamic parameter
<Yorlik> But what logic?
<Yorlik> To me that actually looks like an AI problem
<Yorlik> Finding an optimum in a system with many chaotic parameters
<hkaiser> the target time can be specified already
<Yorlik> Where the most chaotic thing is the dynamic change of the workload in my case
<Yorlik> Yes
<Yorlik> I'll have to figure out some things, means I have to really learn how the system functions dynamically
K-ballo has quit [Quit: K-ballo]
akheir has quit [Ping timeout: 260 seconds]
<Yorlik> Wow - bumping my autochunker target time from 400 to 5000 microseconds made my framerate like 10x faster - seems the overhead i have per chunk is still too big and lowering it helped. Mostly it's about the retrieval of the lua engine
K-ballo has joined #ste||ar
karame_ has quit [Remote host closed the connection]
<hkaiser> fair
rtohid has quit [Remote host closed the connection]
rtohid has joined #ste||ar
<gonidelis[m]> And does any one know what is it used for in this file?
<gonidelis[m]> ahhh thnx... wasn't aware of that dir
<Yorlik> There is a find file in github you can use to find files.
<Yorlik> See the buttons above the file list
<Yorlik> besides "clone or download"
<gonidelis[m]> 0.0 WOW
<gonidelis[m]> That is surely something I was in need of
<gonidelis[m]> Thanks a lot...
<gonidelis[m]> !!!!
<Yorlik> Cheers! :)
<gonidelis[m]> So I reckong it's just for MACROS declaration
nikunj97 has joined #ste||ar
karame_ has joined #ste||ar
<gonidelis[m]> I am building a new branch... could someone remind me how to specify the installation folder on `cmake`?
jbjnr has quit [*.net *.split]
smith[m]1 has quit [*.net *.split]
nan111 has quit [Remote host closed the connection]
noise[m] has quit [Ping timeout: 240 seconds]
zao[m] has quit [Ping timeout: 260 seconds]
diehlpk_mobile[m has quit [Ping timeout: 260 seconds]
Guest40323 has quit [Ping timeout: 256 seconds]
gonidelis[m] has quit [Ping timeout: 256 seconds]
ms[m] has quit [Ping timeout: 246 seconds]
mdiers[m] has quit [Ping timeout: 244 seconds]
rori has quit [Ping timeout: 244 seconds]
heller1 has quit [Ping timeout: 244 seconds]
kordejong has quit [Ping timeout: 244 seconds]
gdaiss[m] has quit [Ping timeout: 260 seconds]
richard[m]3 has quit [Ping timeout: 260 seconds]
<K-ballo> -DCMAKE_PREFIX_PATH ?
Nikunj__ has joined #ste||ar
<zao> CMAKE_INSTALL_PREFIX, surely?
nikunj97 has quit [Ping timeout: 260 seconds]
weilewei has quit [Remote host closed the connection]
gonidelis[m] has joined #ste||ar
weilewei has joined #ste||ar
<Yorlik> I guess because Iter is used as a forwarded template parameter here. But I always get confused about the use of typename too.
<gonidelis[m]> Yorlik: yeah that makes sense. the `typename` keyword remains a mystery though...
gdaiss[m] has joined #ste||ar
<Yorlik> In general I think it is used if you use the parameter as a type parameter and not to create an instance of that type.
<weilewei> I got an error, https://gist.github.com/weilewei/6a30b94184a16f3a5715b50a94a931a8, which I am not sure what is happenning...
<weilewei> seems like a deadlock
sayefsakin has joined #ste||ar
<Yorlik> Is it a debug build?
jbjnr has joined #ste||ar
<Yorlik> It looks a bit lie this thing which struck me a while ago too
<Yorlik> When you hold a lock and the task yields it's a built in warning in debug mode
<Yorlik> You might wanna check if it goes away in a release build
<Yorlik> You can block it by creating a special object.
<Yorlik> weilewei ^^
<weilewei> Yorlik ok, I haven't tried Release build
<weilewei> I will give it a try
<Yorlik> If it works I can give you a trick to make it work together with a caveat
<K-ballo> gonidelis[m]: the compiler can't know whether `std::decay<Iter>::type` is a value or a type, so it assumes it is a value unless you say otherwise
nan111 has joined #ste||ar
<K-ballo> sometimes the compiler can figure it out by looking at the context in which it is used (where only a type would be valid), and in those cases you don't need to say anything (but you still can)
<Yorlik> I think that has confused me in the pas a bit - "typename" wasn't always used, just sometimes.
Nikunj__ has quit [Quit: Leaving]
<Yorlik> But it makes sense the compiler sometimes needs it sometimes not
<K-ballo> and that changes with standard version, making it even more confusing
<Yorlik> Argh ... hell ..
<K-ballo> I tend to still put it anywhere C++98 would have required it
<Yorlik> I like verbose code - if you get too terse it gets confusing easily.
<K-ballo> which means every time it is dependent, except in the base classes list
richard[m]1 has joined #ste||ar
noise[m] has joined #ste||ar
smith[m] has joined #ste||ar
rori has joined #ste||ar
kordejong has joined #ste||ar
mdiers[m] has joined #ste||ar
diehlpk_mobile[m has joined #ste||ar
ms[m] has joined #ste||ar
heller1 has joined #ste||ar
<hkaiser> weilewei: it is what it says - you're holding a lock (mutex) while the thread holding it is supendeding
<hkaiser> weilewei: this on it's own is not always a problem, but it can lead to nasty deadlocks so we diagnose it
<weilewei> hkaiser is there any way to diagnose it? Apparently I have no idea what does libcds stree test is doing
<weilewei> at least at this point
parsa[m] has joined #ste||ar
zao[m] has joined #ste||ar
<hkaiser> weilewei: run it in a debugger, stop at the thrown exception and go up in the stack backtrace to see if you can spot the frame that holds the lock
parsa[m] is now known as Guest21318
<weilewei> hkaiser ok
<hkaiser> the figure out whether you need to hold the lock while suspending (could be the case) or find a workaround by unlocking it for the duration of the suspension
<hkaiser> if it's necessary, add code that tells hpx to ignore the lock
<weilewei> hkaiser what does it mean tells hpx to ignore the lock? Is there any example?
<Yorlik> weilewei: Create this object: hpx::util::ignore_all_while_checking ignore_lock_checks;
<Yorlik> As long as it exists these checks are ignored
<hkaiser> weilewei: ignoring can be done by putting something like hpx::util::ignore_while_checking<Lock> il(&lock); on the stack before suspending (Lock is your lock-type, e.g. unique_lock<mutex>) and lock is your lock instance
<hkaiser> right, alternatively ignore all locks like Yorlik suggested
<Yorlik> Didn't know you could do it specifically - nice !
<weilewei> Yorlik hkaiser Thanks! I will give it a try
<Yorlik> We really should have a knowledge base or something or a text search for the entire IRC log to find these gems.
<hkaiser> Yorlik: irclog has a search
<Yorlik> hkaiser: Over the entire time?
<hkaiser> yes
<Yorlik> I think I missed that. Thanks !
<Yorlik> Oh - theres two different searches - I see now
<Yorlik> I always used the filter - totally not sufficient - but the real search is nice.
<weilewei> after adding ignoring lock, I still get this error: https://gist.github.com/weilewei/6a30b94184a16f3a5715b50a94a931a8#gistcomment-3326238
<weilewei> this time the error message did not indicate what function causes error... I added ignoring lock everywhere possible
<K-ballo> Yorlik: are you planning to work on the new thread_mapper thing?
<weilewei> hkaiser Yorlik
<Yorlik> Honestly - I do not feel qualified to touch that.
<weilewei> ^^^
<Yorlik> But what I see is, that without a proper thread info support doing per thread object pools easily opens a can of worms or two
nikunj has quit [Ping timeout: 260 seconds]
<weilewei> Yorlik what do you mean
nikunj has joined #ste||ar
<Yorlik> I was answering K-Ballo
<weilewei> ok...
<Yorlik> weilewei: I think there was an issue with that blocking of the lock check if the task migrates between threads.
<Yorlik> Not sure if it is already fixed or just marked for fixing. hkaiser did something on it.
<weilewei> hmmm
<Yorlik> weilewei: You have the same error: what(): suspending thread while at least one lock is being held, stack backtrace: 12 frames:
<Yorlik> So for some reason it is not correctyl disabled - either by user error or by bug.
<weilewei> Yorlik it is the same erorr, but this time it is providing what function in my app causes problem
<weilewei> Yorlik do you think there might be an error in my code that causes thread failed?
<Yorlik> Did you find out where the lock is coming from?
<Yorlik> I'd suggost just blindly following the procedure hkaiser suggested.
<weilewei> ok...
<Yorlik> You wanna put in your object before the lock is taken
<weilewei> like before line 96?
<weilewei> or before line 87?
<Yorlik> Before 87
<Yorlik> Just before the lock is created
<Yorlik> In theory you need the objectr just before the thread yields
<Yorlik> But I think it's safer to put it right before the lock creation
<Yorlik> So - noth locations should work.
<Yorlik> both
<Yorlik> But after all you have that while loop ...
<Yorlik> You don't want to create/destroy all the tiume usually
<Yorlik> Also the while would then need scope brackets
<weilewei> do you mean like this?
<Yorlik> Yes
<Yorlik> That should do
<weilewei> unfortunately, it is still failing
<Yorlik> Maybe you have another lock further up the call stack
<Yorlik> Then you would have to move it up as well.
<Yorlik> Or that dreaded bug struck you where the task migration nullifies the effect of the object
<hkaiser> weilewei: no, add it after thelock is taken
<weilewei> hkaiser ok, add hpx::util::ignore_all_while_checking ignore_lock_checks; after lock is created?
<Yorlik> hkaiser: any specific reason for this? I always created it before I took the lock.
parsa has quit [Read error: Connection reset by peer]
parsa has joined #ste||ar
<Yorlik> weilewei: Concerning the bug I mentioned before it seems hkaiser fixed it and merged already: https://github.com/STEllAR-GROUP/hpx/pull/4610
<Yorlik> So we can rule that out, if you have a recent master
<weilewei> hkaiser ok, so I place the ignore lock after the lock is created, and I got another error with gdb log: https://gist.github.com/weilewei/0985bf7f442f5c2b1722cd5d5a2ea20e
<weilewei> Yorlik I see, thanks
<hkaiser> K-ballo: would you mind checking #4693 whether the compile time improves?
<hkaiser> weilewei: you have another lock
<hkaiser> I would suggest that you don't blindly disable the checks, however
<Yorlik> hkaiser: I'll compile and check.
<Yorlik> Not the times - just functionality
<K-ballo> hkaiser: sure
<hkaiser> sure
<hkaiser> thanks
<weilewei> hkaiser ok... it does not indicate where is the lock is
<hkaiser> Yorlik: let me know if you need more/something else
<hkaiser> weilewei: somewhere up the stack
<Yorlik> I'll give you feedback asap
<K-ballo> github fails to split the diff after a certain number of files :@
<weilewei> hkaiser ok, but the back trace stops at frame 13, let me check
<K-ballo> is it me, or is github pr experience regressing?
<Yorlik> I just checked it out
<Yorlik> hkaiser: What did you change? Just make get_thread_id work on windows? (Would suffice)
<Yorlik> Oh - there's thread_data ... need to look more closely now :)
<K-ballo> hkaiser: bimap went all the way down to #130
<K-ballo> curiously #1 is an mpl::eval_if that I didn't see on my test builds
<weilewei> it turns out that there is a lock in thread suspend function in hpx thread.cpp and here is gdb bt https://gist.github.com/weilewei/5096a1bb557c4f2fb316217356a1aaba
<K-ballo> responsible for 3.27s wall time :| (6801 instantiations)
<weilewei> so essentially, the code creates a vector of empty thread, and at the end, join each of them. each thread will run some specific tasks
<weilewei> vector of hpx::thread
<Yorlik> hkaiser: I am now getting the correct thread IDs. This will allow me a more reliable setup of thread local structures, like pools. Thanks a lot !
<Yorlik> hkaiser - I commented on the PR. Github is doing funny things with my text though - I'll not edit it :D
<K-ballo> taking std::result_of out of the way also helps significantly
<K-ballo> msvc's INVOKE machinery is... not good... but luckily we don't even need it
sayef_ has joined #ste||ar
sayefsakin has quit [Ping timeout: 252 seconds]
sayefsakin has joined #ste||ar
sayef_ has quit [Ping timeout: 240 seconds]