aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 256 seconds]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
katywilliams has joined #ste||ar
katywilliams has quit [Ping timeout: 264 seconds]
daissgr has quit [Ping timeout: 268 seconds]
EverYoung has quit [Ping timeout: 240 seconds]
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 276 seconds]
katywilliams has joined #ste||ar
katywilliams has quit [Ping timeout: 260 seconds]
EverYoung has joined #ste||ar
diehlpk has quit [Ping timeout: 260 seconds]
parsa has quit [Quit: Zzzzzzzzzzzz]
K-ballo has quit [Quit: K-ballo]
EverYoung has quit [Ping timeout: 240 seconds]
hkaiser has quit [Quit: bye]
katywilliams has joined #ste||ar
mcopik has quit [Ping timeout: 246 seconds]
katywilliams has quit [Ping timeout: 276 seconds]
katywilliams has joined #ste||ar
katywilliams has quit [Ping timeout: 240 seconds]
daissgr has joined #ste||ar
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 240 seconds]
daissgr has quit [Ping timeout: 240 seconds]
katywilliams has joined #ste||ar
katywilliams has quit [Ping timeout: 276 seconds]
quaz0r has quit [Ping timeout: 264 seconds]
quaz0r has joined #ste||ar
nanashi55 has quit [Ping timeout: 264 seconds]
nanashi64 has joined #ste||ar
nanashi64 is now known as nanashi55
Anushi1998 has quit [Quit: Leaving]
katywilliams has joined #ste||ar
katywilliams has quit [Ping timeout: 265 seconds]
katywilliams has joined #ste||ar
katywilliams has quit [Ping timeout: 240 seconds]
katywilliams has joined #ste||ar
EverYoung has joined #ste||ar
katywilliams has quit [Ping timeout: 268 seconds]
katywilliams has joined #ste||ar
EverYoung has quit [Ping timeout: 260 seconds]
katywilliams has quit [Ping timeout: 264 seconds]
katywilliams has joined #ste||ar
katywilliams has quit [Ping timeout: 246 seconds]
Anushi1998 has joined #ste||ar
katywilliams has joined #ste||ar
katywilliams has quit [Ping timeout: 256 seconds]
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 240 seconds]
Anushi1998 has quit [Quit: Leaving]
katywilliams has joined #ste||ar
katywilliams has quit [Ping timeout: 256 seconds]
<simbergm>
zao: the dataflow_executor test fails often on the remove_wait_or_add branch
<simbergm>
so either the branch is broken or it's triggering the broken dataflow (test) more often, which would be great
<github>
hpx/release e250d80 Mikael Simberg: Merge remote-tracking branch 'origin/master' into release
ASamir has joined #ste||ar
<ASamir>
Hi @heller,
<heller>
hi ASamir
<ASamir>
I need some guiding points here please. I want to participate in the All-to-All communication project. I need to get more information about the libfabric parcelport layer. John told me that I need to map the FFlib algorithms to be mapped into HPX parcelport. So any starting guide?
<heller>
one sec
<heller>
I have a description, which is probably outdated by now, somewhere
<ASamir>
I have sent you an email with my draft proposal. Could you please give me your comments?
<jbjnr_>
ASamir: this might help a little - it does not contain anything specific about the PP, but it does contain some background info you might find useful ftp://ftp.cscs.ch/out/biddisco/hpx/Applied-Computing-HPX-ZeroCopy.pdf
<jbjnr_>
where possible, we'd like to use the RDMA features in any new work
anushi has quit [Ping timeout: 245 seconds]
<heller>
ASamir: note, that the gist is 4 years old!
<jbjnr_>
what we need is a communicator object that uses AGAS for registration so that each node can find the right comm on other nodes, then the communicator will make calls directly into the LF PP where the new stuff will be added
<jbjnr_>
That's how I imagine it being constructed
<jbjnr_>
heller: might disagree
<ASamir>
Great. I will consider this.
anushi has joined #ste||ar
katywilliams has joined #ste||ar
<jbjnr_>
NB. the way connections are handled with the LF PP is a bit simpler than the others. You can assume that we have a connection to any node already and you just need to worry about how the messages are coordinated between nodes to make a collective operation 'happen' - the communicator on each node will have to be the target for the messages - currently when node A sends something to node B, then node B receives some handle/id to a
<jbjnr_>
context on node A and from there it can exchange the rma/messages/etc. When we do a collective operation, we'll need to possibly add some extra info to allow messages to go to the right handler/etc
katywilliams has quit [Ping timeout: 240 seconds]
Viraj has joined #ste||ar
<github>
[hpx] xaguilar opened pull request #3251: Adds new feature: changing interval used in interval_timer (issue 3244) (master...issue-3244/interval_timer-update) https://git.io/vx85z
<jbjnr_>
simbergm: is this one of yours? thread_suspension_executor_test_exe
<jbjnr_>
cos that's the one that doesn't compile with my new stuff
Anushi1998 has quit [Read error: Connection reset by peer]
anushi has joined #ste||ar
Anushi1998 has joined #ste||ar
EverYoung has joined #ste||ar
Viraj has quit [Ping timeout: 260 seconds]
EverYoung has quit [Ping timeout: 276 seconds]
hkaiser has joined #ste||ar
<zao>
Booo... `-bash: /bin/grep: Argument list too long`
nanashi55 has quit [Ping timeout: 240 seconds]
<simbergm>
jbjnr_: nope, all mine are in tests/unit/resource
nanashi55 has joined #ste||ar
<zao>
187 failures of `tests.unit.lcos.local_dataflow_executor` over 412526 runs :D
<zao>
Let's try that other fancy branch.
<hkaiser>
not even 1% pah!
anushi has quit [Ping timeout: 240 seconds]
Anushi1998 has quit [Ping timeout: 276 seconds]
anushi has joined #ste||ar
<jbjnr_>
simbergm: I think it's an old on of hkaiser 's
<zao>
I can unfortunately not test on the Skylake desktop, as I've stolen all the memory from it to be able to build HPX on the Ryzen :P
<zao>
Gonna slip over to work, but may have some runs on the add-remove-new-whatnot branch later.
K-ballo has joined #ste||ar
Anushi1998 has joined #ste||ar
<jbjnr_>
so ... hkaiser if I create an executor, launch a task using it and return a future, and the executor I used to launch the task goes out of scope and then I get a crash later. should I a) make sure the executor is always copied internally so that stuff isn't lost - or should I be using some intrusive ptr to refcount the internals.
<hkaiser>
jbjnr_: I thikn you should make sure that the executor stays alive long enough
<hkaiser>
that should be done by the application using the executor, not sure if it's the executors resposibility to keep itself alive
Anushi1998 has quit [Read error: Connection reset by peer]
Anushi1998 has joined #ste||ar
nanashi55 has quit [Ping timeout: 240 seconds]
nanashi55 has joined #ste||ar
Anushi1998 has quit [Remote host closed the connection]
Anushi1998 has joined #ste||ar
Smasher has joined #ste||ar
<K-ballo>
I'd say its up to the one that launched the task on that executor
<hkaiser>
K-ballo: nod, I agree
<zao>
simbergm: On that branch, I'm looking at a (thus far) 378/634 ratio of failed/total.
<simbergm>
hkaiser: 2 questions: steps 27, 28 in the release procedure refer to release notes, pre 1.0 they have a special page, for 1.0 it links to what's new
<zao>
No idea about the other tests, just running this one atm.
<simbergm>
I guess we can link to what's new again? (in which case I'll update the instructions)
<simbergm>
zao: yeah, sounds about the same I've seen
<simbergm>
hkaiser, second question: how do you generate the list of issues and prs?
<simbergm>
in what's new
<simbergm>
it fails with the parallel executor, is the async_pack_traversal somehow broken? I don't know enough about that so just guessing
<jbjnr_>
K-ballo: hkaiser ok thanks. I was hoping you'd say that. My guided executor has internal state and when it goes out of scope, bad stuff happens. So I will make usure it's always kept alive
Guest17828 has quit [Ping timeout: 260 seconds]
<hkaiser>
simbergm: yes, let's link to what's new - this is what we had in the beginning - two separate pages
<hkaiser>
one what's new and one release notes, later we got lazy ;)
<hkaiser>
simbergm: list of issues: generated manually by copy pasting things from the web
<hkaiser>
simbergm: there might be a better way, but I never was in the mood to try some scripting
parsa has joined #ste||ar
Anushi1998 has quit [Quit: Leaving]
<simbergm>
hkaiser: thanks and thanks! maybe I'll be in the mood to do some scripting...
<hkaiser>
simbergm: we do not mark a ticket consistently with th emilestone, that makes it (closely to) impossible to get a list of tickets that have been closed through a script
<simbergm>
maybe by date? that would be a good reason to always add a milestone, but I've been bad at doing that
<simbergm>
I prefer to have no milestone if noone has planned to work on an issue soon
<hkaiser>
right, that was why I tried to do that, always - but I'm probably not consistent with this as well
<simbergm>
I'll make a mental note to try to be more consistent, it's possible to filter issues and PRs by no milestone which helps in assigning milestones after the fact
<hkaiser>
simbergm: github allows to move all open tickets to the next milestone in the end, so assigning one is not an issue, I think
parsa has quit [Quit: Zzzzzzzzzzzz]
<simbergm>
yeah, I just don't like putting all the issues in the next milestone :/ but I understand now why you did that, let's try the next release without that, but we can go back if it doesn't work out
<hkaiser>
k
<nikunj_>
@hkaiser: have a minute?
<hkaiser>
nikunj_: yah, a short one ;)
<nikunj_>
I have a question about init implementation with __argc, __argv
<hkaiser>
k
<nikunj_>
How would you like me to implement it?
<nikunj_>
Right now I can't copy paste that initialization code since it will bring multiple definition issues
<hkaiser>
I would hve expected that the __argc/__argv initialization is independent of anything else
<hkaiser>
so copying it over shouldn't be problematic
<hkaiser>
what duplicate symbol errors do you see?
<parsa[w]>
do i have to do map(lambda(x, x), range(1, 3, 5)) to return x itself?
<hkaiser>
yes
<diehlpk_work>
hkaiser, For vectors in BLAZE we are competitive with openmp
<hkaiser>
diehlpk_work: marvelous! did you change a lot?
<diehlpk_work>
No, use 8 times the os threads instead of 4 times
<hkaiser>
ok, that 4 was a guess anyways
<diehlpk_work>
All vectors with less 1000 are executed always in single mode
<diehlpk_work>
with static(1)
<diehlpk_work>
After that all other lengths are executed with dynamic(10)
<hkaiser>
ok, so we should create a special chunker
<diehlpk_work>
Yesm when I have the same values for matrix operations
<hkaiser>
isn't that in a different function anyways?
<diehlpk_work>
Yes, you are right, we could have a blaze vector and bl;aze matrix policy
katywilliams has quit [Ping timeout: 240 seconds]
<diehlpk_work>
I suggest that once I collected all thresholds for matrix and vector operations, we could implement the two policies
<hkaiser>
k, sounds good
<diehlpk_work>
It is only the easiest method two optain these tresholds. I believe that we can optimize things
<diehlpk_work>
But first I want to get everything to be close to openmp
<hkaiser>
good thinking
<diehlpk_work>
Plan is to be close to openmp and use the latest stable version of hpx and push it to blaze repo
<diehlpk_work>
So in the next blaze version we would have a working hpx threads version
<hkaiser>
right
<diehlpk_work>
For the improvement we have to find these values for all benchmarks
<diehlpk_work>
Right now we have something that results in acceptable values for all vector benchmarks
<diehlpk_work>
A more generic approach
diehlpk has joined #ste||ar
Anushi1998 has quit [Ping timeout: 240 seconds]
Viraj has left #ste||ar [#ste||ar]
apsknight has joined #ste||ar
kisaacs has quit [Ping timeout: 260 seconds]
apsknight has left #ste||ar [#ste||ar]
kisaacs has joined #ste||ar
aserio has joined #ste||ar
nikunj has quit [Ping timeout: 260 seconds]
nikunj has joined #ste||ar
diehlpk has quit [Remote host closed the connection]
diehlpk_work has quit [Read error: Connection reset by peer]
diehlpk has joined #ste||ar
mcopik has joined #ste||ar
<aserio>
hkaiser: will you be starting the meeting today?
<hkaiser>
I can do that
katywilliams has joined #ste||ar
diehlpk has quit [Ping timeout: 263 seconds]
diehlpk_work has joined #ste||ar
nanashi64 has joined #ste||ar
nanashi55 has quit [Ping timeout: 276 seconds]
nanashi64 is now known as nanashi55
nikunj has quit [Quit: Page closed]
<diehlpk_work>
To all our GSoc students: You should start to register to GSoC and upload your proposal as a draft version. So we could see who intents to submit a proposal
<zao>
Those proposals are only readable by the HPX crew involved, right?
<zao>
So no need to hold back like they've done on the list w.r.t. competitors?
<diehlpk_work>
zao, Yes
diehlpk has joined #ste||ar
katywilliams has quit [Ping timeout: 246 seconds]
diehlpk has quit [Ping timeout: 276 seconds]
<aserio>
hkaiser: how did you install Jupyter?
<zao>
On the branch, `tests.unit.lcos.local_dataflow` fails occasionally btw.
<heller>
diehlpk_work: also, remind them of the proof of enrollment ;)
<zao>
(on same thing tested)
<heller>
I don't like that
<heller>
FWIW, my large scale tests are good
<heller>
except for my application not scaling on initialization (as always!) ;)
<heller>
4532 cores without a problem so far
<aserio>
heller: calling is_ready on a future should not change the shared state right? (asking for Gregor)
<heller>
nope
<heller>
is_ready is just observing
<aserio>
thanks!
<heller>
which doesn't mean that the future might have become ready once is_ready returned ;)
<aserio>
that's not the problem he is seeing
<aserio>
but good point!
tianyi has quit [Remote host closed the connection]
<hkaiser>
aserio: not at all (yet) ;-)
<aserio>
hkaiser: I just "installed " it with VS
<aserio>
but now I am trying to figure out how to run it :p
<hkaiser>
heh
<zao>
I'm going to have to punch you people who print excessive information in tests.
<zao>
protect_with_nullary_pfo_test - 2000 lines with a count on each.
<hkaiser>
lol
<zao>
Also not overly keen on tests that run for >10s.
david_pfander has quit [Ping timeout: 260 seconds]
<zao>
30s for `tests.unit.resource.suspend_pool`, 32s for `tests.unit.parallel.container_algorithms.sort_range`
<zao>
Really pours gravel into the works if you try to run the test suite routinely.
<zao>
(more offenders too)
katywilliams has joined #ste||ar
parsa[w] has quit [Read error: Connection reset by peer]
diehlpk has joined #ste||ar
katywilliams has quit [Ping timeout: 264 seconds]
katywilliams has joined #ste||ar
katywilliams has quit [Ping timeout: 276 seconds]
kisaacs has quit [Ping timeout: 240 seconds]
<zao>
Not saying that someone should do something about them, but two minutes for three tests is rather meh.
<hkaiser>
zao: right - we have not paid attention to this at all
<zao>
It's of lesser impact if they only run once in some CI, or if someone for some craaaazy reason runs tests on their local checkout ;)
<zao>
My use is quite weird :)
EverYoung has joined #ste||ar
EverYoun_ has quit [Ping timeout: 240 seconds]
<jbjnr_>
zao: some of the sort tests take quite a long time, and that was intentional on my part. They need to actually sort large arrays and exercise the algorithms. I do find them annoying however.
<jbjnr_>
:(
kisaacs has joined #ste||ar
aserio has quit [Quit: aserio]
kisaacs has quit [Ping timeout: 240 seconds]
diehlpk has quit [Remote host closed the connection]
diehlpk has joined #ste||ar
diehlpk has quit [Ping timeout: 260 seconds]
diehlpk_work has quit [Quit: Leaving]
parsa[w] has joined #ste||ar
eschnett has quit [Quit: eschnett]
EverYoung has quit [Remote host closed the connection]
EverYoung has joined #ste||ar
kisaacs has joined #ste||ar
kisaacs has quit [Ping timeout: 240 seconds]
EverYoun_ has joined #ste||ar
EverYoung has quit [Ping timeout: 276 seconds]
EverYoun_ has quit [Ping timeout: 276 seconds]
katywilliams has joined #ste||ar
EverYoung has joined #ste||ar
katywilliams has quit [Ping timeout: 248 seconds]
EverYoun_ has joined #ste||ar
EverYoung has quit [Ping timeout: 240 seconds]
mcopik has quit [Ping timeout: 276 seconds]
kisaacs has joined #ste||ar
EverYoung has joined #ste||ar
EverYoun_ has quit [Ping timeout: 276 seconds]
EverYoung has quit [Ping timeout: 264 seconds]
EverYoung has joined #ste||ar
EverYoung has quit [Remote host closed the connection]