#ste||ar on 2018-03-01 — irc logs at irclog.cct.lsu.edu

2017-05-17 13:54 aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/

00:01 <parsa> copy_assgn: 0, copy ctors: 4, move_assign: 4, move ctors: 65

00:02 <hkaiser> ok, put a breakpoint on the copy constructor of node_data and see where it comes from

00:09 <parsa[w]> hkaiser: comes from here: https://github.com/STEllAR-GROUP/phylanx/blob/5c17ac0f7efdb88671c62d38306569bef1e43457/src/execution_tree/primitives/base_primitives.cpp#L358

00:10 <parsa[w]> ^ scratch that

00:19 EverYoung has joined #ste||ar

00:23 EverYoung has quit [Ping timeout: 245 seconds]

00:24 EverYoung has joined #ste||ar

00:28 parsa has quit [Quit: Zzzzzzzzzzzz]

00:28 parsa has joined #ste||ar

00:33 parsa has quit [Ping timeout: 265 seconds]

00:38 EverYoung has quit [Remote host closed the connection]

00:39 EverYoung has joined #ste||ar

00:40 EverYoung has quit [Remote host closed the connection]

00:40 EverYoung has joined #ste||ar

00:41 EverYoung has quit [Remote host closed the connection]

00:42 EverYoung has joined #ste||ar

00:51 EverYoung has quit [Remote host closed the connection]

00:51 EverYoung has joined #ste||ar

01:07 vamatya has quit [Ping timeout: 256 seconds]

01:10 EverYoun_ has joined #ste||ar

01:10 EverYoung has quit [Read error: Connection reset by peer]

01:18 EverYoun_ has quit [Read error: Connection reset by peer]

01:19 EverYoung has joined #ste||ar

01:30 EverYoung has quit [Ping timeout: 252 seconds]

01:34 EverYoung has joined #ste||ar

01:35 parsa has joined #ste||ar

01:50 gentryx has quit [Quit: No Ping reply in 180 seconds.]

01:51 gentryx has joined #ste||ar

01:52 EverYoung has quit [Ping timeout: 245 seconds]

02:19 sharonhsl has joined #ste||ar

02:28 parsa has quit [Quit: Zzzzzzzzzzzz]

02:32 parsa has joined #ste||ar

02:40 parsa has quit [Quit: Zzzzzzzzzzzz]

02:51 CaptainRubik has joined #ste||ar

03:08 K-ballo has quit [Quit: K-ballo]

03:10 anushi has joined #ste||ar

03:17 hkaiser has quit [Quit: bye]

03:18 anushi has quit [Ping timeout: 265 seconds]

03:21 CaptainRubik has quit [Ping timeout: 260 seconds]

03:32 anushi has joined #ste||ar

03:41 <Zwei_> Is there any difference between happens-before relationship and sequence-before?

03:42 anushi has quit [Read error: Connection reset by peer]

03:42 <Zwei_> Oh, sequence-before applies strictly within a single thread.

03:42 <Zwei_> Correct me if I'm wrong.

03:43 anushi has joined #ste||ar

03:48 Anushi1998 has joined #ste||ar

03:51 anushi has quit [Ping timeout: 245 seconds]

03:58 Anushi1998 has left #ste||ar ["AndroIRC"]

03:59 anushi has joined #ste||ar

04:02 Anushi1998 has joined #ste||ar

04:06 anushi has quit [Ping timeout: 245 seconds]

04:15 vamatya has joined #ste||ar

04:15 nanashi55 has quit [Ping timeout: 240 seconds]

04:16 nanashi55 has joined #ste||ar

04:24 sharonhsl has quit [Quit: sharonhsl]

06:03 vamatya has quit [Ping timeout: 256 seconds]

06:06 nikunj has joined #ste||ar

06:12 sharonhsl has joined #ste||ar

06:46 Anushi1998 has quit [Remote host closed the connection]

06:46 Anushi1998 has joined #ste||ar

07:08 nikunj has quit [Quit: Page closed]

07:08 Guest42771 has quit [Ping timeout: 260 seconds]

07:36 david_pfander has joined #ste||ar

07:39 david_pfander has quit [Remote host closed the connection]

07:56 david_pfander has joined #ste||ar

08:09 david_pfander has quit [Remote host closed the connection]

08:09 sharonhsl has quit [Quit: sharonhsl]

08:09 david_pfander has joined #ste||ar

08:19 jaafar has quit [Ping timeout: 240 seconds]

08:59 heller_ has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]

09:01 heller_ has joined #ste||ar

09:23 jakub_golinowski has joined #ste||ar

09:29 <github> [hpx] Anushi1998 reopened pull request #3204: Changing std::rand() to a better inbuilt PRNG generator. (master...master) https://git.io/vAPXG

09:33 simbergm has quit [Ping timeout: 248 seconds]

09:39 Anushi1998 has quit [Ping timeout: 265 seconds]

09:43 sharonhsl has joined #ste||ar

09:43 sharonhsl has quit [Client Quit]

09:47 parsa has joined #ste||ar

09:49 Anushi1998 has joined #ste||ar

09:52 sharonhsl has joined #ste||ar

09:56 sharonhsl has quit [Client Quit]

09:56 sharonhsl has joined #ste||ar

09:56 sharonhsl has quit [Client Quit]

10:06 <github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/vAMZC

10:06 <github> hpx/gh-pages 14de79a StellarBot: Updating docs

10:16 anushi has joined #ste||ar

10:16 Anushi1998 has quit [Ping timeout: 265 seconds]

10:38 simbergm has joined #ste||ar

11:02 parsa has quit [Quit: Zzzzzzzzzzzz]

11:15 K-ballo has joined #ste||ar

11:27 parsa has joined #ste||ar

11:32 anushi has quit [Remote host closed the connection]

11:34 anushi has joined #ste||ar

12:13 jakub_golinowski has quit [Ping timeout: 268 seconds]

12:19 hkaiser has joined #ste||ar

12:37 <parsa[w]> hkaiser: i didn't get anywhere on shuffle

12:37 <hkaiser> parsa[w]: ok

12:37 <hkaiser> let's look together today

12:38 <parsa[w]> if i have node_data<double> x with a dynamicvector<double> in it, do you expect auto& a = x.vector() and auto& b = x.vector() to refer to the same spot in memory?

12:39 <hkaiser> the references will screw things up as those will be bound to rvalues (possible with msvc only, btw)

12:40 <hkaiser> if the references were const, then yes, they woul drefer to the same matrix data

12:45 anushi has quit [Ping timeout: 245 seconds]

12:46 anushi has joined #ste||ar

13:04 jakub_golinowski has joined #ste||ar

13:08 <parsa[w]> hkaiser: should i create a separate PR for the remainder of the primitives after #261?

13:09 <parsa[w]> or should i put more in it

13:15 <hkaiser> could you leave it separate, pls?

13:51 <simbergm> hrm, unlocking a mutex from an os thread which is different from the one it was locked on seems to be undefined behaviour, or am I reading the std::mutex docs wrongly?

13:51 <hkaiser> simbergm: why would you ever want to do that?

13:52 <simbergm> wondering if the yield_whiles cause problems because they suspend the hpx thread and then hpx thread might get moved to another os thread

13:52 <hkaiser> does that involve a kernel mutex?

13:52 <simbergm> yup, compat::mutex

13:53 <hkaiser> uhh ohh

13:53 <hkaiser> that could be causing the problems zao was reporting

13:53 <simbergm> has someone actually run the suspension tests on windows? because I do plenty of that stuff there :/

13:54 <hkaiser> simbergm: zao has run hello_world many thousand times recently on windows

13:54 <hkaiser> that has exposed the issues

13:54 <simbergm> yeah, I saw that, that's why I'm asking

13:54 <hkaiser> nod

13:54 <simbergm> if that's the cause the suspension tests should fail much faster

13:54 <hkaiser> I have not done any stress testing of suspension

13:55 <hkaiser> simbergm: we generally try not to suspend hpx threads with a lock held

13:55 <zao> simbergm: Not sure if you saw, but I failed much faster when running two localities with six threads each.

13:55 <hkaiser> zao: which would make sense

13:55 <zao> Blowing up with unlocking an unheld mutex instead of blowing up on destroying a busy mutex.

13:55 <hkaiser> zao: that's a different problem, I believe

13:55 <zao> Ok.

13:56 <hkaiser> simbergm: do you have to hold that lock while suspending the thread?

13:56 <zao> These suspension tests, are there specific test programs for those?

13:56 <simbergm> zao: most tests in tests.unit.resource

13:56 <hkaiser> anyways, gotta run

13:57 hkaiser has quit [Quit: bye]

13:57 <simbergm> hkaiser: I guess I could find other ways, I hold it because I don't want to have multiple threads trying to suspend or resume simultanously

13:58 <simbergm> zao: actually more specifically tests.unit.resource.throttle should blow up if the mutexes are the problem

13:59 david_pfander has quit [Ping timeout: 245 seconds]

14:06 <zao> I'll see if I can rig that up to run.

14:07 <simbergm> zao: thanks, that would be very helpful

14:21 hkaiser has joined #ste||ar

14:25 <github> [hpx] hkaiser force-pushed fixing_3182 from 7b7c183 to b8b3862: https://git.io/vAaOj

14:25 <github> hpx/fixing_3182 b8b3862 Hartmut Kaiser: Fixing return type calculation for bulk_then_execute....

14:25 <zao> Bah, initial CMake was without tests and no amount of re-running can get it to change its mind.

14:32 jakub_golinowski has quit [Ping timeout: 260 seconds]

14:38 <zao> simbergm: throttle_test.exe exploded immediately.

14:38 parsa[w] has quit [Ping timeout: 276 seconds]

14:38 <simbergm> zao: bad, but very good

14:38 <zao> note that I still have hello_world.exe running, but that should hopefully not affect it.

14:38 <simbergm> some kind of similar error message I suppose?

14:39 <zao> up-front, an abort()

14:39 <zao> unlock of unowned mutex.

14:39 <zao> 0x00007ffa9e60de70 "f:\\dd\\vctools\\crt\\crtw32\\stdcpp\\thr\\mutex.c(173): unlock of unowned mutex"

14:40 <simbergm> perfect, that's it then

14:40 <simbergm> thanks for checking

14:40 parsa[w] has joined #ste||ar

14:40 <simbergm> seems linux does something sane in that case, or it just happened to never fail

14:40 <zao> https://gist.github.com/zao/6439a8d6646dbee2a6d7b5d8d5b23d40

14:41 <zao> _THREAD_ASSERT(1 <= mtx->count && mtx->thread_id == static_cast<long>(GetCurrentThreadId()), "unlock of unowned mutex");

14:41 <zao> VS runtime explicitly validates that you're on the right thread.

14:42 <zao> I need to hurry out, but I hope that this data helps.

14:42 <hkaiser> nod, good thing it does

14:42 <simbergm> zao: definitely does

14:43 <simbergm> hkaiser: I'll work on this, sorry for not realizing earlier

14:44 <simbergm> for the shutdown we have an unlock_guard when we explicitly call yield, but not for the suspend called by yield_while

14:44 <simbergm> looks like it could be enough to just protect the stopeed_ boolean, or would we want to protect more?

14:46 <hkaiser> simbergm: couldn't you make it atomic in this case?

14:47 aserio has joined #ste||ar

14:48 <simbergm> hkaiser: yeah, or that

14:48 <simbergm> it's used in a few other places where we check stopped_ and/or terminated_

14:48 <simbergm> was mainly wondering if it needs to protect something else I'm not realizing

14:49 <hkaiser> simbergm: nobody knows better than you ;)

14:49 <simbergm> I'll have a look anyway

14:49 <hkaiser> thanks!

14:49 <simbergm> suspension I agree, shutdown I hope but I

14:49 <simbergm> 'm not certain :)

14:52 nikunj has joined #ste||ar

14:55 eschnett has quit [Quit: eschnett]

15:01 <nikunj> @hkaiser: While running the init_globally, I meet with an unexpected run-time error: terminate called after throwing an instance of 'std::invalid_argument'

15:02 <nikunj> what(): hpx::resource::get_partitioner() can be called only after the resource partitioner has been allowed to parse the command line options.

15:03 <hkaiser> nikunj: heh

15:03 <hkaiser> nikunj: so the example throws?

15:03 <nikunj> Yes this is what the init_globally example throws at runtime

15:04 <hkaiser> nikunj: ok, let me investigate - would you mind creating a ticket for this?

15:05 <nikunj> @hkaiser: Surely, I will. According to what I have investigated about this issue, it seems to come from a misbehaving thread

15:06 <hkaiser> nikunj: I think we just have not updated the example after the recent changes to the resource partitioner in hpx

15:06 <hkaiser> simbergm: you copy ^^ ?

15:08 jakub_golinowski has joined #ste||ar

15:11 jaafar has joined #ste||ar

15:14 jakub_golinowski has quit [Read error: Connection reset by peer]

15:35 <nikunj> @hkaiser: Regarding implementing the GSoC project, I think util will consume most of the time followed by runtime ( mainly in hpx::thread and hpx::resource). performance_counter and lcos should take about the same time and parallel should come next in order of time consumption.

15:35 <nikunj> @hkaiser: Is this conclusion correct?

15:37 parsa[[w]] has joined #ste||ar

15:39 parsa[w] has quit [Ping timeout: 245 seconds]

15:39 aserio1 has joined #ste||ar

15:41 aserio has quit [Ping timeout: 240 seconds]

15:41 aserio1 is now known as aserio

15:43 <simbergm> hkaiser: yep, I will also investigate

15:57 <diehlpk_work> To all GSoC srudents, I heard that some of you want to submit the proposal withour showing it to us. I definitely do not recommend this. Please share the proposal before submission with us

16:00 <K-ballo> I'm curious where one would hear such a thing

16:02 <diehlpk_work> Had some chats with students and they wanted to do it like this. I did not aks why the plan to do it

16:04 tushar1997 has quit [Quit: The Lounge - https://thelounge.github.io]

16:15 <github> [hpx] msimberg opened pull request #3209: Fix locking problems during shutdown (master...fix-shutdown-locks) https://git.io/vAMpJ

16:16 <simbergm> zao: there's now #3209 with an additional fix for the windows shutdown problem, if you have time to stress test hello_world again that would be greatly appreciated

16:16 <simbergm> next: suspension locks...

16:51 akheir has joined #ste||ar

17:02 EverYoung has joined #ste||ar

17:02 EverYoung has quit [Remote host closed the connection]

17:03 EverYoung has joined #ste||ar

17:08 aserio1 has joined #ste||ar

17:12 aserio has quit [Ping timeout: 265 seconds]

17:12 aserio1 is now known as aserio

17:15 Smasher has joined #ste||ar

17:21 parsa has quit [Read error: Connection reset by peer]

17:21 parsa| has joined #ste||ar

17:23 <zao> "Failed to connect to github.com port 443: Timed out"

17:23 <zao> Welp, not gonna test that right now then :D

17:39 vamatya has joined #ste||ar

17:43 parsa| has quit [Quit: Zzzzzzzzzzzz]

17:49 EverYoung has quit [Remote host closed the connection]

17:49 EverYoung has joined #ste||ar

17:50 parsa has joined #ste||ar

17:56 parsa has quit [Quit: Zzzzzzzzzzzz]

17:58 EverYoung has quit [Remote host closed the connection]

17:58 EverYoung has joined #ste||ar

18:00 EverYoung has quit [Remote host closed the connection]

18:01 EverYoung has joined #ste||ar

18:04 eschnett has joined #ste||ar

18:10 victor_ludorum has joined #ste||ar

18:12 EverYoung has quit [Remote host closed the connection]

18:14 EverYoung has joined #ste||ar

18:26 EverYoun_ has joined #ste||ar

18:27 EverYoun_ has quit [Remote host closed the connection]

18:27 EverYoun_ has joined #ste||ar

18:28 EverYoung has quit [Ping timeout: 245 seconds]

18:29 <zao> Finally got build through, running hello t2 l2 now.

18:31 aserio has quit [Ping timeout: 245 seconds]

18:43 EverYoun_ has quit [Remote host closed the connection]

18:43 EverYoung has joined #ste||ar

18:46 <zao> https://gist.github.com/zao/82050012af5fe7120342e296a8d05195

18:46 <zao> Something's not quite right.

18:46 <zao> No actual crashes yet.

18:46 <hkaiser> lol

18:51 <zao> Woo, crash.

18:51 <zao> 0x00007ffa9e60ddf0 "f:\\dd\\vctools\\crt\\crtw32\\stdcpp\\thr\\mutex.c(51): mutex destroyed while busy"

18:52 <zao> Main thread remaining, in teardown after main.

18:52 <zao> Original problem.

18:55 <zao> The thread that supposedly holds the mutex is another ID than the main thread, btw.

18:56 <zao> Noteworthy is that the other process exited with "Process 1 failed with an unexpected error code of -1073741819 (expected 0)" when this assert fired.

18:56 <zao> So this case is triggered by the other locality bailing out at some point?

18:58 <zao> That's 0xc0000005 btw, access violation.

18:59 <zao> The double printout in the non-crashing case is curious too.

19:01 EverYoun_ has joined #ste||ar

19:03 EverYoung has quit [Ping timeout: 245 seconds]

19:05 aserio has joined #ste||ar

19:08 jakub_golinowski has joined #ste||ar

19:13 parsa has joined #ste||ar

19:22 nanashi55 has quit [Ping timeout: 256 seconds]

19:22 nanashi55 has joined #ste||ar

19:34 aserio has quit [Ping timeout: 276 seconds]

19:35 aserio has joined #ste||ar

19:36 nanashi55 has quit [Ping timeout: 265 seconds]

19:38 nanashi55 has joined #ste||ar

20:07 victor_ludorum has quit [Quit: Page closed]

20:12 nikunj has quit [Quit: Page closed]

20:22 <zao> This is neat. If I kill the second process, the first process may just wedge forever or crash.

20:23 <zao> hkaiser: https://gist.github.com/zao/25cc314553d7d8b6888f8ddccd9c05e9

20:23 <zao> I may have accidentally killed the first process there.

20:24 <hkaiser> zao: now you're getting inventive ;)

20:24 <zao> It seems that if the process is told to shut down, we try to look up a config entry which relies on having a working get_partitioner?

20:24 <hkaiser> nod

20:24 <hkaiser> I thought I fixed that

20:25 <hkaiser> zao: ahh, it's #3202, not merged yet

20:26 <zao> Ah yes, it's trying to throw std::invalid_argument there it seems.

20:26 <zao> Was confused by the control flow.

20:26 <hkaiser> nod

20:27 <hkaiser> that's unrelated (and fixed)

20:27 <zao> My hope would be to get to a point where I could provoke locality 0 into reliably breaking so I could record a time-travel trace of it.

20:29 <jbjnr> hkaiser: heller_ After a very long day battling the snow over here in the uk (there's hardly any where I am), I have just returned from Warwick Uni where I had my PhD Defense. Passed. yay \o/

20:29 <hkaiser> jbjnr: congrats!

20:29 <hkaiser> Dr. John!

20:29 <jbjnr> One of the examiners really like the HPX serialization paper - but said - what a shame it was in such a shit conference!

20:29 <jbjnr> thanks hkaiser

20:30 <hkaiser> I'm glad for you that everything ended well

20:30 <aserio> hkaiser: meeting time!

20:30 <hkaiser> jbjnr: https://www.youtube.com/watch?v=d1K9nyxI5eY

20:30 <jbjnr> yup. I am totally exhausted now and am going to have a few more drinks and relax for a bit.

20:31 <aserio> Congrats jbjnr!!!

20:31 parsa has quit [Quit: Zzzzzzzzzzzz]

20:31 <jbjnr> lol

20:31 <jbjnr> Thanks Adrian

20:32 <jbjnr> just need to fix that cholesky small block size now to make everything right with the world ...

20:32 <aserio> lol

20:32 parsa has joined #ste||ar

20:34 hkaiser_ has joined #ste||ar

20:35 <diehlpk_work> jbjnr, Concrats

20:36 <zao> jbjnr: YAY!

20:37 hkaiser has quit [Ping timeout: 276 seconds]

20:47 mcopik has joined #ste||ar

20:51 tushar1997 has joined #ste||ar

21:10 mcopik has quit [Ping timeout: 256 seconds]

21:11 parsa has quit [Quit: Zzzzzzzzzzzz]

21:12 nikunj has joined #ste||ar

21:12 <heller_> jbjnr: awesome! Congrats!

21:29 hkaiser_ has quit [Quit: bye]

21:30 EverYoung has joined #ste||ar

21:33 EverYoun_ has quit [Ping timeout: 255 seconds]

21:40 aserio has quit [Ping timeout: 260 seconds]

21:48 aserio has joined #ste||ar

21:52 Smasher has quit [Remote host closed the connection]

22:02 EverYoung has quit [Remote host closed the connection]

22:02 EverYoung has joined #ste||ar

22:09 jakub_golinowski has quit [Quit: Leaving]

22:10 parsa has joined #ste||ar

22:21 parsa has quit [Quit: Zzzzzzzzzzzz]

22:31 EverYoun_ has joined #ste||ar

22:35 EverYoung has quit [Ping timeout: 245 seconds]

22:38 aserio has quit [Quit: aserio]

22:40 akheir has quit [Remote host closed the connection]

22:45 hkaiser has joined #ste||ar

22:54 EverYoun_ has quit [Remote host closed the connection]

22:55 EverYoung has joined #ste||ar

23:17 zombieleet has joined #ste||ar

23:33 ct_ has joined #ste||ar

23:40 zombieleet has quit [Quit: Leaving]

23:52 parsa has joined #ste||ar

23:53 <parsa[[w]]> hkaiser: clang hates my iterators and is upset about this line: https://github.com/STEllAR-GROUP/phylanx/blob/bff32319e9363de4651aff47c250870b49172c1b/src/execution_tree/primitives/shuffle_operation.cpp#L67 -> errors: https://paste.fedoraproject.org/paste/AiAhwBEQrRLzf~sY2JrKzg/raw

23:54 parsa[[w]] is now known as parsa[w]

23:55 <K-ballo> parsa[w]: what do your iterators return on dereference?

23:56 <parsa[w]> blaze::row<T>, in this case blaze::Row<CustomMatrix<many things>>

23:56 <K-ballo> literally, blaze::row<T>? as in, not a reference to a row?

23:56 <K-ballo> that's what the error message suggest

23:56 <K-ballo> that would put you in the input iterator category, and one can't shuffle input ranges

23:56 <parsa[w]> i have bad habit of using lower case. sorry the type is blaze::Row<T>

23:57 <parsa[w]> also, this isn't total nonsense this code works on windows

23:57 <K-ballo> windows happily binds rvalues to non-const lvalue references

23:57 <K-ballo> s/windows/msvc

23:57 nikunj has quit [Ping timeout: 260 seconds]

23:58 <K-ballo> "works" might be too generous though

23:58 <K-ballo> I'd just say it compiles and runs, and does something :)

23:58 <parsa[w]> why is a reference needed? blaze::Row is a proxy object and you can write to it

23:59 <K-ballo> yawza, then it might actually "work" work

23:59 <K-ballo> because those are the standard rules