hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu
weilewei has joined #ste||ar
diehlpk_work has quit [Ping timeout: 240 seconds]
weilewei has quit [Quit: Ping timeout (120 seconds)]
jehelset has quit [Ping timeout: 240 seconds]
K-ballo has quit [Quit: K-ballo]
diehlpk has joined #ste||ar
diehlpk has quit [Quit: Leaving.]
jehelset has joined #ste||ar
diehlpk has joined #ste||ar
diehlpk has quit [Quit: Leaving.]
hkaiser has quit [Quit: Bye!]
jehelset has quit [Ping timeout: 260 seconds]
jehelset has joined #ste||ar
jehelset has quit [Ping timeout: 240 seconds]
K-ballo has joined #ste||ar
K-ballo has quit [Ping timeout: 256 seconds]
K-ballo has joined #ste||ar
hkaiser has joined #ste||ar
K-ballo1 has joined #ste||ar
K-ballo has quit [Ping timeout: 268 seconds]
K-ballo1 is now known as K-ballo
jbjnr[m] has joined #ste||ar
<jbjnr[m]>
hkaiser: yt?
<jbjnr[m]>
(not sure if this is actually working, over a year since I last connected)
<hkaiser>
jbjnr[m]: hey
<K-ballo>
we read you :)
<jbjnr[m]>
Thought I'd ping you here to chat about BBB rather than wait for meeting arrangement over email
<hkaiser>
jbalint: ok
<hkaiser>
jbjnr[m]: ok
<jbjnr[m]>
I probly only need 10 mins of your time, to understand what must happen in BBB in order for agas to function
<hkaiser>
jbjnr[m]: BBB is a simple collective operation where each locality registers itself with locality zero by sending a message through the early pp magic
<jbjnr[m]>
libfabric boots up, nodes are connected, but then BBB hangs
<hkaiser>
locality zero responds with a single message to each of the registering localities
<jbjnr[m]>
greetings K-ballo btw. long time no chat
<hkaiser>
locality zero waits for all expected localities to register, all others wait for the response only
<hkaiser>
jbjnr[m]: are you sure it's the BBB blocking? or could it be that hpx::barrier is (still) flaky?
<jbjnr[m]>
it might be the barrier - what I see if my LF debug output as the system starts, I actually exchange all fabric addresses using PMI or tcp before BBB starts, all that is fine and all localities actually know how to talk to each other, then BBB begins and after that I see no more debug messages from my stuff, so something in there is blocking
<jbjnr[m]>
it happens rarely on small node s=counts, but once you get to 256 it is 50% of the time
<hkaiser>
the barrier is being used only after BBB returns
<jbjnr[m]>
(on the cray anyway)
<jbjnr[m]>
So I assume something in BBB is faulty
<jbjnr[m]>
but until I know what should and shouldn't happen, I thught I'd avoid poking it
<hkaiser>
unlikely - but - anything is possible
<hkaiser>
BBB is using the early parcel logic
<hkaiser>
could that be something?
<jbjnr[m]>
it sounds like I could skip BBB completely since the libfabric layer builds localities already
<jbjnr[m]>
it could, but if it hits that code, my debug logs ould show it
<hkaiser>
jbjnr[m]: BBB exchanges criticial information
<hkaiser>
so you don't see any early parcels, not even the ones going to locality zero?
<jbjnr[m]>
apart from localities (addresses), what else?
<hkaiser>
action ids
<hkaiser>
number of cores used
<jbjnr[m]>
a zoom chat would be easier ... aha action ids, ok
<hkaiser>
jbjnr[m]: I'm fully booked starting in 20 minutes until 2pm my time today :/
<jbjnr[m]>
in that case, I will leave this chat window open and when you are free and have a moment, please ping me
<hkaiser>
we could chat over zoom tomorrow 9am CST/16.00 your time
<jbjnr[m]>
ok. 16::00 tomorrow. thanks
Guest1562 has joined #ste||ar
Guest1562 has quit [Client Quit]
<hkaiser>
srinivasyadav227: what I was wondering, and forgot to ask today, would it make sense to have the simd execution policy enabled even if no vectorization is available, simply falling back to using scalar types?
<hkaiser>
gnikunj[m]: ^^ what do you think?
<gnikunj[m]>
hkaiser: I don't think you can do it without actually defining a specialization. But I wanted to mention this in the call that it should technically be the case in HPX.
<hkaiser>
we could simply disable the vectorization backend and map simd-->seq and par_simd-->par
<gnikunj[m]>
Right, that's what I meant with specialization
<gnikunj[m]>
Because simd stuff is only available starting gcc11 (which is fairly new and may not be available everywhere)
<dkaratza[m]>
hkaiser: On thursday I have a presentation which just changed time, the new time overlaps with our meeting. Unfortunately I cannot miss it because it is the final presentation of a 3 month work. Could we maybe change our meeting? I can do both tomorrow and Friday
<hkaiser>
sure
<dkaratza[m]>
<hkaiser> "sure" <- great what time fits u?
<hkaiser>
dkaratza[m]: good questions, Friday before lunch over here would work
<dkaratza[m]>
<hkaiser> "dkaratza: good questions, Friday..." <- so sth like 12:30 for you? i don't know, what time do you usually have lunch? because there are some differences in the definition of lunch time haha
<hkaiser>
lol
<hkaiser>
yah, before 12pm CST
<dkaratza[m]>
maybe after your lunch then? i have a lecture until 11:45 CST
<gonidelis[m]>
Friday 13.30 CST ?
<dkaratza[m]>
gonidelis[m]: sure
<dkaratza[m]>
is there a standard for hpx::function_ref?
<dkaratza[m]>
about the github ticket..how can i do it?
<hkaiser>
dkaratza[m]: so Friday, I could meet either 9-12 or after 2pm CST
<dkaratza[m]>
hkaiser: yeah im going through it now
<hkaiser>
dkaratza[m]: you have to teach me how to build our docs ;-) I have not tried if the PR actually works - I was hoping you could try that
<gonidelis[m]>
hkaiser: 9.30-10.30 class. but you may as well proceed without me if that suits. otherwise, after 2pm would be awesome
<gonidelis[m]>
hkaiser: yahh..... -_-
<dkaratza[m]>
hkaiser: yeah sure, i can do that
hkaiser_ has joined #ste||ar
<hkaiser_>
after 2pm is very late for Demetra
<gonidelis[m]>
dkaratza: as soon as function_ref gets standardized cppref will add content accordingly ;)
<dkaratza[m]>
gonidelis[m]: well after 2pm is 9 for me
<gonidelis[m]>
2 + 7 = 9....
<gonidelis[m]>
ok whatever dimitra Demetra wants
<dkaratza[m]>
i dont know if there is not another option we can do it that way
<gonidelis[m]>
you can meet morning witout me
<gonidelis[m]>
without*
<dkaratza[m]>
gonidelis[m]: i have a mandatory lecture
<dkaratza[m]>
in your morning
<dkaratza[m]>
so the bad timing is my lectures are during your morning
<gonidelis[m]>
ok so 9.00 PM Netherlands time ?? :D
<hkaiser_>
dkaratza[m]: netx week, then?
<dkaratza[m]>
tomorrow i have a class i can skip, if you have any time available
<hkaiser_>
10am would be open, still
hkaiser has quit [Ping timeout: 240 seconds]
<gnikunj[m]>
hkaiser_: you may want to switch to element (matrix) instead of Hexchat :P
<dkaratza[m]>
hkaiser_: yeah it works!
<dkaratza[m]>
so, tomorrow 10 am cst
<gonidelis[m]>
gnikunj: don't pull him out of the 2005 comfort
<gnikunj[m]>
gonidelis: Hahaha. I've been a Hexchat user myself but I switched to element a few months ago when I couldn't find a working IRC client for phone and I had no time to write one myself.
<gnikunj[m]>
hkaiser_: Do we support resource elasticity/malleability (dynamically adding or removing processes at runtime)?
<dkaratza[m]>
hkaiser_: also at the pr you say that the text in the rst file should be: :hpx-header:`base_path,file_name`. what is the base_path?