hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu
K-ballo has quit [Quit: K-ballo]
hkaiser has quit [Quit: Bye!]
diehlpk_work_ has quit [Remote host closed the connection]
<ms[m]>
circleci is back (was already yesterday)
<ms[m]>
apparently some automated system "erroneously flagged" us...
K-ballo has joined #ste||ar
hkaiser has joined #ste||ar
<K-ballo>
hkaiser: have you been able to successfully use VerySleepy on windows?
<K-ballo>
it stopped working for me when I switched to win7
<hkaiser>
K-ballo: I have not tried
<hkaiser>
I'm usually using vtune for this
<K-ballo>
lately i've been using the performance tools that come with VS
<hkaiser>
ok
<hkaiser>
do they work well? I never even tried
<K-ballo>
they're nowhere near vtune, but most of the time they're good enough
<K-ballo>
sometimes though they magically stop capturing events, and it takes a rebuild and/or restart to get it going again
<hkaiser>
ok
<jedi18[m]>
I can't use vtune on my AMD laptop right? Any idea how's the alternative AMD μProf compared to vtune?
<hkaiser>
jedi18[m]: vtune should work on AMD machines, although it might have reduced functionalities
<hkaiser>
I have never tried using AMDs tools, however
<jedi18[m]>
Oh ok cool, I'll try using it on the ranges stuff
<zao>
VTune has greatly reduced profiling functionality on non-Intel chips, I think it even rules out any form of hardware-assisted instrumentation. Nowhere in the development specs for the software did the phrase "maybe look at what analogous functionality AMD offers for their CPUs" exist :D
<zao>
Not saying it's intentionally worse than it should be, but it's intentionally worse than it should be.
<hkaiser>
zao: +1
<hkaiser>
zao: Intel is known for crippling their tools on AMDs architectures
<jedi18[m]>
That's rude of them xD, but I guess it's expected since they are their primary competitor
<jedi18[m]>
Let's hope AMD μProf is good then
<hkaiser>
jedi18[m]: for a first assessment, software based analysis techniques are good enough, usually - the hardware assisted stuff is needed only if you already know where your bottlenecks are
<jedi18[m]>
Oh ok yeah I don't have any immediate use for it, just wanted to try it out
<hkaiser>
sure
<gonidelis[m]>
jedi18: been struggling with that quite a lot. vtune hates my ryzen.i was able to do minimal analysis like cpu usage and stuff though
<gonidelis[m]>
AMD's uprof is the worst program in the world. not just compared to other profilers, but in general
<gonidelis[m]>
botttom line, i would opt in for vtune even if they want treat us like second class citizens. it's that good.
<jedi18[m]>
Oh ok thanks, won't bother trying out uprof then, vtune it is
<gonidelis[m]>
jedi18: please don't
<gonidelis[m]>
K-ballo: how come this works? yet even, how come this does not work when I uncomment line #16? even worse, I would expect `views::for_each(vv, lambda)` to work. https://wandbox.org/permlink/GBcpLZxsWolqz2cZ
<K-ballo>
can't match what you are saying to the wandbox snippet... tell me with words, why wouldn't that work?
<gonidelis[m]>
that's the least important of my three questions because I can undestand that it is just a view closure copying assignment, yet I do not use its usage
<K-ballo>
when you ask why something work or doesn't work you need to tell what your expectation actually is
<gonidelis[m]>
i expect the second snippet to work
<gonidelis[m]>
i expect it to lazily multiply each element of vv by 2
<K-ballo>
are you describing the transform view?
<gonidelis[m]>
...
<gonidelis[m]>
the for_each vs transform beef dictates that their main difference is the one is being done in place
<gonidelis[m]>
they sound similar
<K-ballo>
no, they are views
<K-ballo>
I see neither docs nor tests for the for_each view, but from the implementation it seems to be a join over a transform
<gonidelis[m]>
aha
<gonidelis[m]>
"Lazily applies an unary function to each element in the source range that returns another range (possibly empty), flattening the result. "
<gonidelis[m]>
"Given a source range and a unary function, return a new range where each result element is the result of applying the unary function to a source element."
<gonidelis[m]>
excluding the "flattening results part", what's the difference?
<gonidelis[m]>
K-ballo: so it means they do aproximately the same thing. almost.
<K-ballo>
except for the part they are different, they do the same.. is that what you are saying? you are entirely correct
<gonidelis[m]>
so the only difference is the flattening result thingy?
<gonidelis[m]>
what does flattening the result even mean?
<K-ballo>
flattening means going from range of range of T to range of U
<K-ballo>
flattening without transformation would go from range of range of T to range of T
<gonidelis[m]>
ahh thaks for that. wow
<gonidelis[m]>
i see!
<K-ballo>
flattened {"a", "bc", "d"} is "abcd"
<gonidelis[m]>
with all these things we are saying sounds like for_each(vv, lambda) should work. aha. got it. that's nice actually
<gonidelis[m]>
K-ballo: it's the functor!
<K-ballo>
function object
<gonidelis[m]>
no, it's views:for_each's accepted functor
<gonidelis[m]>
from SO: "You misunderstand what view::for_each() is, it's totally different from std::for_each", oh really? 😅
<K-ballo>
views::for_each actually takes a callable
<gonidelis[m]>
which is then specifically casted to an rvalue ref
<hkaiser>
yah, they circumvent using forward to safe compile time
<hkaiser>
that cast is doing the same as std::forward
<gonidelis[m]>
huh.... ok that's nice
<hkaiser>
gnikunj[m]: yt?
jehelset has joined #ste||ar
<gnikunj[m]>
hkaiser: forgot to set an alarm :/
<gnikunj[m]>
Ofc it had to happen again
<hkaiser>
never happened before - and yet again ;-)
<srinivasyadav227>
Or gnikunj c
<gnikunj[m]>
When am I getting that alarm clock you talked about?
<gnikunj[m]>
I think I'm in desperate need of one ;)
<hkaiser>
gnikunj[m]: what if you used your cell phone, it can wake you up as well
<gonidelis[m]>
gnikunj: uiuc should be givin them for free, given how much they exhaust you over there ;p
<gnikunj[m]>
Hahahaha true
<hkaiser>
gonidelis[m]: nah, he's paying for being punished
<gnikunj[m]>
I'll pester the CS dept here for one
<gonidelis[m]>
hahaha
<gnikunj[m]>
hkaiser: good thing the pay is small ;)
tufei has quit [Remote host closed the connection]
tufei has joined #ste||ar
<hkaiser>
gnikunj[m]: most likely mdspan will go into C++23, so excuses for not looking into striding for vectorization anymore
<gnikunj[m]>
I did go through the implementation
<hkaiser>
any insights?
<gnikunj[m]>
Striding is imp!
<gnikunj[m]>
I don't think any other runtime supports striding. So I want us to be first!
<gonidelis[m]>
what's striding?
<gnikunj[m]>
A stride of n is considering elements in order i, i+n,..., i+k*n,...
<gonidelis[m]>
what's the proposal then?
<gnikunj[m]>
We're trying to get vector pack of strides
<gnikunj[m]>
Vector pack in general is applied to contiguous data elements
<gnikunj[m]>
So if the user wants stride, the user needs to change the data structure used to have a behavior similar to stride
<gonidelis[m]>
wow!
<gonidelis[m]>
talkin about convenience
<gnikunj[m]>
Yes, having stride make our vector implementation very general
<hkaiser>
gnikunj[m]: but possibly non-efficient, so let's try it out!
<gnikunj[m]>
Yes, it will be non-efficient until we figure out the data locality party
<gnikunj[m]>
s/party/part/
<hkaiser>
freudian typo ;-)
<gonidelis[m]>
hahahahahahahhhahahahaha
<gnikunj[m]>
Shhhh no one saw that 🤫
<pedro_barbosa[m]>
Hey, I was doing an example with HPXCL and CUDA and at some point I wanted to replace some values in an array I pass as argument to the kernel with the value of a smaller array, however I keep getting an error, if I try to replace the argument array with fixed numbers I can do it without a problem but when I try do replace it with a value of another array I get the following error:
<pedro_barbosa[m]>
```
<pedro_barbosa[m]>
what(): CudaError: an illegal memory access was encountered at buffer::~buffer Error during synchronization of stream
<pedro_barbosa[m]>
```
<hkaiser>
pedro_barbosa[m]: well, it's difficult to know what's wrong without seeing the code
<pedro_barbosa[m]>
line 27 and 45 on the cpp file has the declaration of both the host array and then the buffer I use to copy to the device
<hkaiser>
not sure, I see cudaMallocHost calls only
<pedro_barbosa[m]>
line 45
<hkaiser>
sorry, I don't understand the code
<hkaiser>
I still don't see what's wrong - where is newPos?
<pedro_barbosa[m]>
in line 45 I declare the buffer that I'm going to use to copy the newPos to the device, on line 46 I do the copy, line 124 I add it to the argument list and then on line 140 I run the kernel with that argument list
<pedro_barbosa[m]>
then if you go to the kernel file on line 72 you can see the function that's being executed with newPost being the 1st argument
<hkaiser>
ok
diehlpk has joined #ste||ar
diehlpk has quit [Quit: Leaving.]
diehlpk has joined #ste||ar
RostamLog has joined #ste||ar
<gonidelis[m]>
hkaiser: pm
diehlpk has quit [Quit: Leaving.]
diehlpk has joined #ste||ar
akheir has quit [Ping timeout: 264 seconds]
jehelset has quit [Remote host closed the connection]