00:00
weilewei has quit [Remote host closed the connection]
01:20
K-ballo has quit [Quit: K-ballo]
01:51
hkaiser has quit [Ping timeout: 264 seconds]
01:56
Guest65867 has quit [Ping timeout: 240 seconds]
01:56
Guest65867 has joined #ste||ar
02:57
Guest65867 has quit [Ping timeout: 265 seconds]
02:57
Guest65867 has joined #ste||ar
04:01
Coldblackice_ has joined #ste||ar
04:03
Coldblackice has quit [Ping timeout: 240 seconds]
05:50
Guest65867 has quit [Ping timeout: 252 seconds]
05:51
Guest65867 has joined #ste||ar
07:09
<
simbergm >
hkaiser, jbjnr, heller I may not be able to join the call today but please go ahead in any case, I'll try to join as soon as I'm free
09:18
rori has joined #ste||ar
09:37
Coldblackice has joined #ste||ar
09:39
Coldblackice_ has quit [Ping timeout: 240 seconds]
10:41
K-ballo has joined #ste||ar
11:09
hkaiser has joined #ste||ar
11:09
Guest65867 has quit [Ping timeout: 276 seconds]
11:10
Guest65867 has joined #ste||ar
12:34
Guest65867 has quit [Ping timeout: 240 seconds]
12:36
Guest65867 has joined #ste||ar
14:25
hkaiser has quit [Quit: bye]
14:52
weilewei has joined #ste||ar
14:53
hkaiser has joined #ste||ar
15:00
aserio has joined #ste||ar
16:32
nikunj has quit [Remote host closed the connection]
16:33
nikunj has joined #ste||ar
16:54
aserio has quit [Ping timeout: 245 seconds]
16:55
jaafar has quit [Ping timeout: 264 seconds]
16:57
aserio has joined #ste||ar
17:03
<
weilewei >
hkaiser without --hpx:threads=1, I got the following error
17:32
<
zao >
Ah, gist has a comment with pictures. Clever.
17:50
<
weilewei >
zao right, I just found it today
17:51
aserio has quit [Ping timeout: 264 seconds]
17:53
aserio has joined #ste||ar
17:57
aserio has quit [Ping timeout: 250 seconds]
17:59
<
hkaiser >
weilewei: that's not too helpful :/
17:59
<
hkaiser >
apparently happens inside CUDA
17:59
<
weilewei >
yea, I know, I cannot find any helpful info
17:59
<
hkaiser >
I tend to think that this is not our problem
17:59
<
weilewei >
Agreed...
18:00
rori has quit [Quit: WeeChat 1.9.1]
18:00
<
weilewei >
But I am not sure how to reproduce the problem with a smaller example either
18:01
<
hkaiser >
weilewei: I'd try to get in contact with the nvidia guys at your place
18:01
<
weilewei >
Ok, I will talk to Ronnie, and he will start the conversation among us
18:02
<
weilewei >
I will meet Ronnie this afternoon
18:03
aserio has joined #ste||ar
18:45
weilewei has quit [Remote host closed the connection]
19:12
aserio has quit [Ping timeout: 264 seconds]
19:29
aserio has joined #ste||ar
19:34
weilewei has joined #ste||ar
20:03
<
heller >
weilewei: what's the actual error you are getting?
20:04
<
weilewei >
the assertion failed, number of expected finish walkers does not match with number of actual workers
20:04
<
heller >
and does it happen in the exceptional code path or in the normal one?
20:06
<
weilewei >
exceptional code path
20:06
<
heller >
where is the exception being thrown, and which exception is being thrown?
20:08
<
weilewei >
This line is being called, then the following assert failed too
20:08
<
heller >
that's not what I meant
20:08
<
heller >
this is happening because one of the tasks threw an exception
20:08
<
weilewei >
what do you mean exactly? Sorry for misunderstanding
20:08
<
heller >
which task did throw the exception?
20:09
<
heller >
Where was that exception thrown?
20:09
<
weilewei >
hmm, I need to run it again and check which task
20:11
weilewei has quit [Remote host closed the connection]
20:12
<
heller >
that's what we discussed a while back: figure out where that exception came from. This will bring you closer to a possible solution
20:13
<
heller >
Back then I said: "I bet this is a lock held during suspension error"
20:20
weilewei has joined #ste||ar
20:24
<
weilewei >
heller while I am running, each async task is associated with a thread id, so do you want to know which async id that causes the problem?
20:25
<
heller >
no. I want to know at which file at which line the exception was thrown
20:25
<
heller >
or rather, you should want to know that
20:26
<
weilewei >
ok, let me check it when it hits the assertion failure
20:26
<
heller >
we already know where it hits that
20:26
<
heller >
find the location which throws the exception in the task that makes future::get to throw the error
20:26
<
weilewei >
Yea, I am thinking when I hit the assertion failure, then I can know where does the failure come from
20:27
<
heller >
that's too late
20:27
<
weilewei >
oh... so I should put a breakpoint on the future::get?
20:28
<
heller >
"catch throw"
20:28
<
weilewei >
this line?
20:29
<
heller >
make the debugger stop whenever an exception is being thrown
20:29
<
heller >
in gdb, you do this, by typing "catch throw"
20:30
<
weilewei >
oh, how about arm-forge, let me searh a bit
20:30
<
heller >
ddt most likely has a similar option in its gui somewhere
20:32
<
weilewei >
I found it! There are two: stop at catch, stop at throw
20:32
<
weilewei >
which one should i choose
20:33
<
weilewei >
I can do multiple chocies, so I just ticked them both
20:33
<
weilewei >
let me run the program again
20:34
<
weilewei >
heller sorry for misunderstanding, I was not aware of gdb has catch throw so I was not understanding correctly
20:36
<
heller >
weilewei: I recommended that on october 30th already ;)
20:38
<
weilewei >
oops, it stops at the very beginning when I starts the program
20:38
<
heller >
hit continue
20:38
<
heller >
or inspect the backtrace
20:39
<
weilewei >
a lot of conitnue
20:49
<
weilewei >
it seems that I need to run to the line before hpx async starts, otherwise it stops at every exception
20:51
<
weilewei >
heller do you mean by saying inspect the backtrace
20:51
<
weilewei >
heller what do you mean by saying inspect the backtrace
20:52
<
heller >
in your gist above, you only see `__cxa_throw` which is coming somwhere out of your C++ standard library. What you want to see however is the line in
*your* code
20:52
<
heller >
each function call generates something what is usually being called a stackframe
20:53
<
heller >
that is, you can see the entire call chain by inspecting the trace of function calls
20:53
<
heller >
aka backtrace
20:53
<
heller >
by inspecting i mean: Let it display and look at it.
20:53
<
heller >
and see if it is of any interest to you
20:55
<
heller >
sometime it is also called stacktrace
20:55
<
weilewei >
is this an interesting error?
20:55
<
weilewei >
soory, need to run to a meeting
20:56
<
heller >
how should I know?
20:56
<
heller >
please understand and apply what I said above
20:56
<
weilewei >
let me inspect later
20:57
<
heller >
NB: lots of threads are throwing an exception at this point in time
21:38
aserio has quit [Ping timeout: 245 seconds]
21:41
aserio has joined #ste||ar
21:42
weilewei has quit [Remote host closed the connection]
21:45
jaafar has joined #ste||ar
21:52
aserio1 has joined #ste||ar
21:56
aserio has quit [Ping timeout: 250 seconds]
21:56
aserio1 is now known as aserio
22:02
weilewei has joined #ste||ar
22:17
weilewei has quit [Remote host closed the connection]
22:18
Coldblackice has quit []
22:18
Coldblackice has joined #ste||ar
22:28
aserio has quit [Quit: aserio]
22:31
<
simbergm >
what commit of hpx are you running?
23:25
jaafar has quit [Ping timeout: 245 seconds]
23:38
hkaiser has quit [Quit: bye]
23:45
weilewei has joined #ste||ar
23:56
<
weilewei >
wei2303030426
23:56
<
weilewei >
oops, wrong message
23:58
<
weilewei >
simbergm I used this one: 414380e50e55ed1f4ebfde57f3bda7018d6d1cf0