aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
Guest83749 has quit [Read error: Connection reset by peer]
patg has joined #ste||ar
patg is now known as Guest65618
eschnett has joined #ste||ar
hkaiser has quit [Quit: bye]
zbyerly has joined #ste||ar
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 246 seconds]
ajaivgeorge has quit [Read error: Connection reset by peer]
K-ballo has quit [Quit: K-ballo]
<Guest65618>
heller_: on the SC site it says notifications in July
Guest65618 is now known as patg
<patg>
heller_: sorry I was Guest...
eschnett has quit [Quit: eschnett]
eschnett has joined #ste||ar
zbyerly has quit [Remote host closed the connection]
zbyerly has joined #ste||ar
eschnett has quit [Quit: eschnett]
patg has quit [Quit: See you later]
zbyerly has quit [Remote host closed the connection]
zbyerly has joined #ste||ar
pree has joined #ste||ar
jaafar has quit [Quit: Konversation terminated!]
zbyerly has quit [Ping timeout: 240 seconds]
zbyerly has joined #ste||ar
zbyerly has quit [Ping timeout: 255 seconds]
shoshijak has joined #ste||ar
bikineev has joined #ste||ar
EverYoung has joined #ste||ar
david_pfander has joined #ste||ar
EverYoung has quit [Ping timeout: 255 seconds]
bikineev has quit [Remote host closed the connection]
david_pfander has quit [Ping timeout: 260 seconds]
Matombo has joined #ste||ar
david_pfander has joined #ste||ar
shoshijak has quit [Ping timeout: 240 seconds]
Matombo has quit [Remote host closed the connection]
shoshijak has joined #ste||ar
Matombo has joined #ste||ar
david_pfander has quit [Ping timeout: 268 seconds]
<jbjnr_>
heller_: have you received a HiHat invitation?
<jbjnr_>
I was supposed to forward something to you a few weeks ago. Forgot. Now I've got an invite - hoping you did too
david_pfander has joined #ste||ar
<heller_>
HiHat?
<heller_>
jbjnr_: which email adress did you send it to?
<jbjnr_>
heller_: did you receive the hihat forwarded msg - the one I sent to harm ut keeps bouncing back
<heller_>
jbjnr_: I got it, yes
<heller_>
subscribed already
<jbjnr_>
website or just "accepted" the invite
<heller_>
both
Matombo has joined #ste||ar
<ABresting>
heller_: I need libsigsegv by default, and for this user need to enter path in a config file, now which config file it should be? on build time when we use cmake?
<heller_>
yes
<heller_>
as said, it should be optional
<heller_>
that is, you only need it when the user sets the option
<heller_>
for example: HPX_WITH_STACKOVERFLOW_DETECTION
<heller_>
then you search for libsigsev and include all the rest
<ABresting>
at cmake time user is gonna enter -HPX_WITH_STACKOVERFLOW_DETECTION="<path-to-libsigsegv>"?
<heller_>
no.
<heller_>
HPX_WITH_STACKOVERFLOW_DETECTION=On
<heller_>
by default, it would be off
<ABresting>
but I need path as well
<heller_>
find_package(LibSigSev) then
<heller_>
that's handled by the find module
<zao>
--with-foo=/bar/baz is a very autotools/configure thing.
<ABresting>
I am little confused, here we are using "LibSigSegv" or it's a path given by user or the find module is going to find all occurences of "LibSigSegv"?
<heller_>
it is not necessary for every day usage, not everyone has it installed
<ABresting>
the reason is it will be functional only if the user initiates it in the main() module. If they have initiated it then system will try and include it in compile time, then if not found it's gonnaa throw an error
<zao>
Do we have any other features that are off-by-default that turn on in the presence of a library on the system?
<zao>
Would it be something that would be possible to obscure from the end-user in say the HPX init functions, or is it exactly-once-per-program?
<ABresting>
its more of a user initiated thing, as user should know that they need to trigger it through API call, but even if user is not having it installed then its gonna throw a warning then install it if you want to use this feature
david_pfander has joined #ste||ar
<heller_>
zao: no, there is nothing that gets magically turned on
<heller_>
ABresting: why is it initiated by the user
<heller_>
in the user program even?
<heller_>
why is not possible to activate the stack overflow handling like any other (possibly optional) component?
<heller_>
you also have the command line and HPX configuration utilities
<ABresting>
because it need to be initiated from the main module else it doesn't work in multithreaded environment
<heller_>
so?
<heller_>
hpx::init is called from within the main module
<heller_>
(usually)
<ABresting>
so without it the signal handler is not installed for the entire process
<jbjnr_>
heller_: if an application allocates a thing sing new blah(...) and then passes it to the runtime - it is deleted by the runtime - are there any cases where we can get the wrong allocator (like jemalloc etc) that is freeing the object - ie different from the one that allocated it
<jbjnr_>
I have a callback in the thread pool to return a user allocated scheduler - this bombs out in destruction
shoshijak has quit [Ping timeout: 240 seconds]
<jbjnr_>
how can I force my code (that uses HPX) to use the jemalloc allocator too. I assumed it would be doing thet alresady - but this looks suspicious
shoshijak has joined #ste||ar
pree has joined #ste||ar
<heller_>
jbjnr_: I would assume the same
<heller_>
jbjnr_: what does "bombing out" mean? do you destruct the scheduler twice, maybe?
<jbjnr_>
destruct once, segfault once
<heller_>
are you sure?
<heller_>
where does the segfault happen?
<jbjnr_>
yes. sure
<jbjnr_>
in the destructor of my custom scheduler
<jbjnr_>
when I delte stuf that gdb says is lovely
shoshijak has quit [Client Quit]
shoshijak has joined #ste||ar
<heller_>
jbjnr_: if it would be jemalloc vs. some other malloc, you'd see a different segfault though
<jbjnr_>
like what?
<jbjnr_>
bad_alloc
<heller_>
somewhere inside of jemalloc or system malloc
<heller_>
if you get a segfault inside of your destructor, that could only mean that the object has been already destructed, the pointer doesn't point to the original object anymore, or a problem with the destructor
<heller_>
show me
<jbjnr_>
the segfault happens when deleting one of the lockfree queues inside the schedluer
EverYoung has joined #ste||ar
K-ballo has joined #ste||ar
<heller_>
have they not been allocated correctly beforehand, maybe?
<jbjnr_>
it's an effing queue that tasks have been running on all day, then dies in destructor, the queues are ok
<jbjnr_>
does anything there look interesting to you?
<jbjnr_>
that's the same error when we do not use our custom scheduler
<heller_>
hmmm
<heller_>
delete nullptr should be fine
<heller_>
scoped_ptr?
<heller_>
doesn't look wrong
<heller_>
jbjnr_: where can I look at your files?
<jbjnr_>
you can't. but do not worry. I just wanted to ask if the maloc thing was a possibility. We clearly have a bug, and it may be connected to the tss stuff ...
<heller_>
write an isolated testcase
<heller_>
that would be my advise
denis_blank has joined #ste||ar
<zao>
It's not occuring during process shutdown phase, I hope. There be destruction order dragons.
bikineev has quit [Remote host closed the connection]
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
bikineev has quit [Ping timeout: 255 seconds]
bikineev has joined #ste||ar
eschnett has joined #ste||ar
jaafar has joined #ste||ar
denis_blank has quit [Quit: denis_blank]
zbyerly has quit [Remote host closed the connection]
zbyerly has joined #ste||ar
eschnett has quit [Quit: eschnett]
pree has quit [Ping timeout: 240 seconds]
bikineev has quit [Ping timeout: 260 seconds]
eschnett has joined #ste||ar
<jbjnr_>
hkaiser: we are seeing the runtime TSS and thread_num TSS initialized and deinitialized multiple times throughout the code - is this normal? (or might we have broken something).
aserio has joined #ste||ar
bikineev has joined #ste||ar
eschnett has quit [Quit: eschnett]
EverYoung has joined #ste||ar
pree has joined #ste||ar
ajaivgeorge_ has joined #ste||ar
ajaivgeorge has quit [Read error: Connection reset by peer]
EverYoung has quit [Ping timeout: 276 seconds]
ajaivgeorge has joined #ste||ar
ajaivgeorge_ has quit [Ping timeout: 268 seconds]
<hkaiser>
jbjnr_: hmmm
<hkaiser>
for each os-thread this should happen once, I think
<jbjnr_>
I mean that at start, the runtime tss is set 2 or 3 times (there is a null check), then on cleanup, it is deinit-ed multiple times
<jbjnr_>
there are no segfaults - but it strikes me as sloppy
<jbjnr_>
I'm testing master bbrach at the mo to see if it does the sae
<jbjnr_>
^same
<hkaiser>
jbjnr_: even without your new code?
<hkaiser>
that would be unexpected
<jbjnr_>
I'm testing now
<hkaiser>
but everything is possible, I wouldn't be surprised to have a bug there
<jbjnr_>
it may not be a bug, it's just that flags are sometimes set from different places and done twice ...
<hkaiser>
jbjnr_: note that the functions are (supposed to) being called once per os-thread
<jbjnr_>
I see the runtime and applier ptrs TSS being set at least twice on the main thread (on master) - but as I say, this may not be a problem since it says 'if nullptr then ...'
<jbjnr_>
gosh main thread calls deinit_tss 3 times :(
<jbjnr_>
you can see when that thread calls init/deinit (we are using just one worker thread to make the trace short)
<hkaiser>
jbjnr_: is 0x7f0e55b60800 the thread id?
<jbjnr_>
yes. the os thread id
<hkaiser>
ok, will have a look
<hkaiser>
that shouldn't happen, really
<jbjnr_>
The only reason I'm worried, is that we have random memory corruption in our multi-pool hpx and we moved some TSS code around
<jbjnr_>
so debugging it in case we screweed up
eschnett has joined #ste||ar
<hkaiser>
nod
<jbjnr_>
with mutiple-pools we might have messed up the thread_num_tss (hence my checking)
eschnett has quit [Client Quit]
zbyerly has quit [Remote host closed the connection]
zbyerly has joined #ste||ar
bikineev has quit [Ping timeout: 240 seconds]
<hkaiser>
jbjnr_: yah, I can confirm that init_tss is called twice for the main-thread
ajaivgeorge has quit [Read error: Connection reset by peer]
<hkaiser>
this is a bug
ajaivgeorge has joined #ste||ar
<jbjnr_>
ok, but not a serious one (hopoefully - no real side effects)
bikineev has joined #ste||ar
eschnett has joined #ste||ar
<hkaiser>
jbjnr_: right
<hkaiser>
jbjnr_: the functions are called more than once only for the main thread
<hkaiser>
I can fix that
ajaivgeorge has quit [Quit: ajaivgeorge]
ajaivgeorge has joined #ste||ar
aserio has quit [Ping timeout: 246 seconds]
ajaivgeorge has quit [Ping timeout: 240 seconds]
aserio has joined #ste||ar
<aserio>
wash, wash[m]: will you be joining us today
<wash[m]>
Aserio hey
<wash[m]>
Aserio I am in israel
<wash[m]>
Cannot call in
<hkaiser>
wash[m]: back to the homeland, huh?
david_pfander has quit [Ping timeout: 240 seconds]
ajaivgeorge has joined #ste||ar
<wash[m]>
Yah :)
<aserio>
wash[m]: enjoy your trip
aserio has quit [Ping timeout: 258 seconds]
eschnett has quit [Quit: eschnett]
aserio has joined #ste||ar
zbyerly has quit [Remote host closed the connection]
zbyerly has joined #ste||ar
zbyerly has quit [Ping timeout: 246 seconds]
bikineev has quit [Ping timeout: 268 seconds]
bikineev has joined #ste||ar
bikineev has quit [Ping timeout: 260 seconds]
aserio has quit [Ping timeout: 246 seconds]
shoshijak has quit [Ping timeout: 240 seconds]
hkaiser has quit [Read error: Connection reset by peer]
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 246 seconds]
jaafar has quit [Ping timeout: 240 seconds]
shoshijak has joined #ste||ar
zbyerly has joined #ste||ar
Matombo has joined #ste||ar
aserio has joined #ste||ar
Matombo has quit [Remote host closed the connection]
<heller>
aserio: did you get a tutorial notification yet?
<aserio>
heller: yes, let me tell you about it after this meeting
<heller>
aserio: ok
<heller>
aserio: care to share the reviews?
eschnett has joined #ste||ar
eschnett has quit [Client Quit]
aserio has quit [Ping timeout: 255 seconds]
ajaivgeorge has quit [Ping timeout: 260 seconds]
ajaivgeorge has joined #ste||ar
bikineev has joined #ste||ar
aserio has joined #ste||ar
hkaiser has joined #ste||ar
jaafar has joined #ste||ar
ajaivgeorge has quit [Ping timeout: 240 seconds]
ajaivgeorge has joined #ste||ar
hkaiser_ has joined #ste||ar
hkaiser has quit [Read error: Connection reset by peer]
jaafar has quit [Ping timeout: 260 seconds]
zbyerly has quit [Remote host closed the connection]
denis_blank has joined #ste||ar
jaafar has joined #ste||ar
ajaivgeorge has quit [Ping timeout: 240 seconds]
eschnett has joined #ste||ar
ajaivgeorge has joined #ste||ar
aserio has quit [Ping timeout: 240 seconds]
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 258 seconds]
shoshijak has quit [Ping timeout: 240 seconds]
aserio has joined #ste||ar
shoshijak has joined #ste||ar
jgoncal has quit [Ping timeout: 258 seconds]
bikineev has quit [Ping timeout: 240 seconds]
jgoncal has joined #ste||ar
bikineev has joined #ste||ar
Matombo has joined #ste||ar
eschnett has quit [Quit: eschnett]
jaafar has quit [Quit: Konversation terminated!]
bikineev has quit [Remote host closed the connection]
<jbjnr_>
hkaiser_: is hpx::async(executor, &fun, ...); not a valid overload of async?
<pree>
what the continuations in parcels specify ?
patg[w]_ has joined #ste||ar
<pree>
I'm confused , continuation means tasks which should occur after a event
<pree>
??
<hkaiser_>
jbjnr_: yah, sure
<pree>
^^
<hkaiser_>
pree: yah, the continuation is usually a global id of a lco which receives the result
<pree>
thanks but I have thought continuation means tasks which should continue after some event
<hkaiser_>
nod
<hkaiser_>
lcos usually trigger things
<patg[w]_>
Got Irc working at work finally
<hkaiser_>
patg[w]_: great
<pree>
thank you hkaiser_
<pree>
I got confused by it's name
<patg[w]_>
Now to find some cycles to do that install
<jbjnr_>
soo... hkaiser_ I discovered by accident that hpx::async(executor, &func, . ..) works fine, but when I put that inside a lambda - it doesn't work if the executor is captured by value - only works when captured by reference.
<jbjnr_>
is that expected?
<hkaiser_>
jbjnr_: might just not work for const executors
<hkaiser_>
make the lambda mutable
<jbjnr_>
ok
<K-ballo>
is executor copyable?
<jbjnr_>
yup mutable works
<K-ballo>
it is then
<jbjnr_>
it is copyable normally I think
eschnett has joined #ste||ar
<patg[w]_>
hkaiser_, see private
bikineev has joined #ste||ar
patg[w]_ has quit [Quit: Leaving]
patg[w] has joined #ste||ar
<heller>
we really need my thesis, the only parallex terms I use are AGAS and Parcel ;)
<patg[w]>
heller, we really do!
<patg[w]>
When do you expect to be done?
<heller>
33 days to go
<heller>
it will be very C++ centric
<patg[w]>
heller, hope its human readable :)
<K-ballo>
for some humans at least
<heller>
patg[w]: yes. the PDF renders fine :P
<patg[w]>
K-ballo, I have a feeling human readable means different things to you and me
pree has quit [Quit: AaBbCc]
<patg[w]>
heller, I'm sure it will be great!
<heller>
patg[w]: the only thing I care about right now is to have it submitted and that I'll pass ;)
<heller>
noone will read it in the end anyway
<hkaiser_>
heller: you should use 'split-phase transaction' as well ;)
<patg[w]>
heller: I can empathize
<heller>
hkaiser_: I am using the C++ Memory Model definitions instead ;)
<hkaiser_>
doesn't sound as cool ;)
patg[w] has quit [Quit: Leaving]
<heller>
makes it more approachable though
<heller>
and underlines the story about the natural extension of the C++ Programming Language as of today ;)
<ABresting>
any advantage of using alternate stack technique while detecting stack overflow ?
<heller>
which come to your mind?
<ABresting>
david_pfander wrote a technique using alternate stack
<ABresting>
meanwhile it can be done without using the alternate stack
<heller>
now I don't remember when I gave this talk ... was it mardis gras?
ajaivgeorge has quit [Ping timeout: 240 seconds]
ajaivgeorge has joined #ste||ar
<jbjnr_>
by wrong thread pool - how do you imagine this happening - we have two pools with N and M os threads - are you saying that a thread is being deleted by pool B when it is owned by Pool A ?
ajaivgeorge has quit [Client Quit]
ajaivgeorge has joined #ste||ar
<jbjnr_>
quetion - pool A has 6 threads and they are numbered 0-5 - pool B has 2 threads numbered 6-7 - but insde their own pools they have indices 0-5 and 0-1 : do any of the stealing/suspending/resuming functions use the thread numbers (tss) where an incorrect indexing might be an issue?
<jbjnr_>
so threadmanager knows threads 0-7 but each pool uses different numbering and a thread offset for each pool is maintained. We get problems only when we use multiple pools - so we suspect issues here. Any suggestions are welcome for where to look for a bad index.
<jbjnr_>
falling asleep now. will resume tomorrow.