aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
taeguk has quit [Quit: Page closed]
parsa has quit [Quit: Zzzzzzzzzzzz]
EverYoung has quit [Ping timeout: 246 seconds]
taeguk has joined #ste||ar
zbyerly_ has joined #ste||ar
<taeguk>
Excuse me, I have a question about scan_partitioner.
<taeguk>
As I think, maybe 'next' in f3 is unnessary.
<hkaiser>
does it hurt in any way?
<taeguk>
I'm improving scan_partitioner. And 'next' in f3 is the obstacle for my work.
vamatya has quit [Ping timeout: 240 seconds]
<hkaiser>
taeguk: give me a sec
<taeguk>
In my other version of scan_partitioner which will be utilized in parallel unique, remove, and remove_if, I should create vector storing results of f1 for passing 'next' to f3. This overhead can be removed if we eliminate 'next' in f3.
<hkaiser>
taeguk: well, not sure how it can be in the way, but feel free to remove it if you think it's not needed
<taeguk>
Okay
<hkaiser>
taeguk: can this new scan_partitioner be used for the other algorithms as well?
<taeguk>
thank you to answer.
<taeguk>
yes.
<taeguk>
What I'm doing is creating scan partitioner tag.
<hkaiser>
do you think this new algorithm will have better performance?
<hkaiser>
taeguk: ^^
<taeguk>
no, new scan_partitioner performs f3 sequentially.
<hkaiser>
why is that needed?
<taeguk>
For implementing unique, remove, and remove_if, f3 must be performed sequentially.
<taeguk>
Yes, because those are in-place algorithms, I can't implement those without sequentially performed f3.
<hkaiser>
taeguk: isn't in the current scan_partitioner f3 executed sequentially as well?
<taeguk>
If I can implement those fully parallely, that is best case. But I can't found the way.
<taeguk>
hkaiser: yes you're right.
<hkaiser>
I thought the data dependencies enforce that
<hkaiser>
I also think that next is passed to f3 to enforce that data dependency
<hkaiser>
but I'd need to get through my notes again
<taeguk>
Well, as I think, without 'next' in f3, data dependency is keeped.
<taeguk>
And In current, because of some bugs, temporarily scan_partitioner uses launch::sync instead of policy.executor().
<taeguk>
So, scan_partitioner executed f3 seqentially for now.
<taeguk>
But, this will be fixed in the future.
<hkaiser>
I don't think that the sync was enforcing the sequentiality of f3
<hkaiser>
'sync' just means that dataflow will not launch a new thread to invoke it's function but the function will be directly executed by the thread which makes the last future ready
<taeguk>
yes, without 'sync', many f3 will be executed concurrently.
<taeguk>
I used the naming 'sequentially' for non-concurrently.
<taeguk>
I mean that first f3 is performed, and then second f3 is performed, and then thired f3 is performed, .....
<taeguk>
For those action, I used the naming 'sequentially'
<hkaiser>
next becomes ready whenever the corresponding invocation of f1 has finished running
<hkaiser>
and f3 depends on f1
<hkaiser>
so it is needed as a dependency
<hkaiser>
you need that sequentiality to ensure you don't overwrite data, right?
<taeguk>
As I think 'curr' in above code is unnessary because it will be used for next 'prev'.
<hkaiser>
I don't have my notes available right now
<taeguk>
okay
<hkaiser>
taeguk: for instance copy_if: you are not allowed to star copying a segment (f3) before you have scanned it once to find out which elements to copy (f1) - so f3 depends on f1
<hkaiser>
start*
<hkaiser>
taeguk: you're saying that f3 indirectly depends on f1 anyways as f2 depends on prev and current f1 and f3 depends on f2
<hkaiser>
but f3 depends on f2 from the previous step
<hkaiser>
so it f3 depends indirectly through f2 on f1 'prevprev' and f1 'prev'
<hkaiser>
but it additionally needs to depend on f1 executed on its own segment
<hkaiser>
do I miss something/
<hkaiser>
?
<taeguk>
yes you're right. But I can't agree about "but it additionally needs to depend on f1 executed on its own segment"
<hkaiser>
sure it needs to
<hkaiser>
as I showed with the copy_if example above
<taeguk>
because copy_if is out-place algorithm, there is no need.
<hkaiser>
you can't start copying befoe you know which to copy
<taeguk>
ah
<K-ballo>
hkaiser: I was considering waiting for the resource manager to merge before opening the atomics PR
<hkaiser>
K-ballo: ok, that will happen soon
<hkaiser>
K-ballo: also, there is a atomics PR, isn't there?
<taeguk>
hkaiser: you're right. 'next' is needed..
<hkaiser>
ok
<taeguk>
sorry for my misunderstanding
<hkaiser>
no worries - no need to apologize
<K-ballo>
uhm, there is? there is.. must have forgotten
<K-ballo>
oh right, that's where you left the comment
<hkaiser>
#2782, yah
<K-ballo>
and I never did found the right spot for -latomic, did I?
<hkaiser>
K-ballo: not yet, I think
<K-ballo>
ok, good to know... ignore the PR for a little longer
<hkaiser>
K-ballo: ok, let me know
<K-ballo>
I thought I was just waiting for resource manager
<K-ballo>
odd.. I remember finding a "standard library" flags section or something like it