K-ballo changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
K-ballo has quit [Quit: K-ballo]
diehlpk_work has quit [Remote host closed the connection]
hkaiser has quit [Quit: bye]
<gnikunj[m]> ms: yt?
<gnikunj[m]> ms: when you see this msg, look into: https://gist.github.com/NK-Nikunj/a47d9555360b347a574e4dced9aed5a4
<gnikunj[m]> turns out the above code segfaults sometimes while working nicely other times. Do you know what could cause it? On further inspection, it came out to be coming from https://github.com/STEllAR-GROUP/hpx/blob/master/libs/parallelism/futures/include/hpx/futures/detail/future_data.hpp#L124
<gnikunj[m]> so it's because of trying to access a shared state that doesn't exist anymore
<gnikunj[m]> as of now, I could only reproduce this error on rostam (any compiler) but couldn't reproduce the same on my laptop, so things may vary for you too
<gnikunj[m]> also, it is coming from within HPX backend in Kokkos on line: https://github.com/kokkos/kokkos/blob/4b97a22ff7be7635116930bb97173058d6079202/core/src/Kokkos_HPX.hpp#L223
<ms[m]> gnikunj: interesting... your code looks sane, but it's a good test if that indeed does fail every now and then
<ms[m]> the line in the backend that you say segfaults is definitely an odd place for it to segfault :/
<ms[m]> I'll try to find some time to reproduce it
<ms[m]> release or debug?
<gnikunj[m]> both
<ms[m]> 👍️
<ms[m]> fails pretty quickly or does it usually take a long time?
<gnikunj[m]> it fails right on spot
<ms[m]> deterministically?
<ms[m]> first time around?
<gnikunj[m]> the test shouldn't take more than a few ms to execute anyway
<gnikunj[m]> no, it's not deterministic. Sometimes it executes to completion, sometimes it seg faults
<ms[m]> ok, thanks!
<gnikunj[m]> that Kokkos::Cuda::finalize() error was linked to a similar code above
<gnikunj[m]> so if we solve this, things should work for my resilience library too
<ms[m]> ok, sounds good
<ms[m]> no promises on when I can have a look but I'll try to do it soon
<ms[m]> zao: since you seem to have followed this freenode drama a bit more closely (and I realized only later that this is on freenode): would you recommend we move over to libera.chat? I get the impression that's the only sane thing to do
<gnikunj[m]> thanks!
<zao> ms[m]: Long-term, it feels like the reasonable thing to do. One may want to wait a bit for the dust to settle.
<ms[m]> sounds good, thanks
K-ballo has joined #ste||ar
hkaiser has joined #ste||ar
hkaiser has quit [Quit: bye]
hkaiser has joined #ste||ar
nanmiao has joined #ste||ar
<pedro_barbosa[m]> is there a way to print specific fields from the get_cuda_info function in HPXCL?
<pedro_barbosa[m]> example, the function returns the name of the device, memory size, cache and whatnot, and I wanted to print only the name of the device
<rachitt_shah[m]> ms hkaiser can I create a google drive to consolidate pdfs/notes?
<rachitt_shah[m]> This would be alongside the wiki, but would help me out with personal tracking. I'll add everyone to the folder.
<ms[m]> rachitt_shah: yeah, that's ok
<gnikunj[m]> hkaiser: could you point me to the timepoint code? I'll add in cuda asm code and use that instead.
<hkaiser> pedro_barbosa[m]: not sure I understand your problem, care to elaborate?
<pedro_barbosa[m]> if I execute the function get_cuda_info() I get the following output (GIST: https://gist.github.com/PADBarbosa/7816d706ec53382cf7d21ef2757c20fb), I wanted to know if there's a way to only print the name of the device
<nanmiao> hkaiser meet in your zoom or mine?
<ms[m]> rachitt_shah: this is btw the old documentation: http://stellar.cct.lsu.edu/files/hpx-1.0.0/html/index.html
<hkaiser> nanmiao: --hpx:ini=hpx.lock_detection!=0
hkaiser has quit [Ping timeout: 258 seconds]
ghosthell has joined #ste||ar
<ghosthell> Non Terrestrial Or Terrestrial Beings which can help me with Trans Universal Transportation (Please PM Me)5
hkaiser has joined #ste||ar
nanmiao has quit [Quit: Connection closed]
parsa| has joined #ste||ar
parsa has quit [Ping timeout: 260 seconds]
parsa| is now known as parsa
diehlpk_work has joined #ste||ar
<gonidelis[m]> hkaiser yt???
K-ballo has quit [Quit: K-ballo]
<hkaiser> gonidelis[m]: here
<gonidelis[m]> Do you have any good and quick-to-use profiler to use for my runs ??
<hkaiser> what platform?
<hkaiser> rostam? your local machine?
<hkaiser> gonidelis[m]: well generally Intel vtune
<hkaiser> I think you can download it for free nowadays
<gonidelis[m]> Locally
<gonidelis[m]> Linux
<gonidelis[m]> Oh ok
<gonidelis[m]> Let me try it
<hkaiser> download Intel VTune, then
<gonidelis[m]> Thanks !
k-ballo[m] has left #ste||ar ["User left"]