hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
nikunj has joined #ste||ar
<nikunj>
hkaiser, yt?
<hkaiser>
nikunj: here
<nikunj>
hkaiser, I was thinking to implement the vote function that you were talking of
<nikunj>
would you want me to continue with that or should I run some benchmarks?
<nikunj>
coz I think we have expected graphs as of now
<nikunj>
so we should move forward with implementing the vote function
<hkaiser>
nikunj: that would be a second set of replicate API's, right?
<nikunj>
so you want me to make 2 sets of replicate API?
<nikunj>
one that returns the first ans and one that votes?
<hkaiser>
well, one as it is and one with an arbiter function
<hkaiser>
both with async and dataflow variations
<nikunj>
we can do that
<hkaiser>
nikunj: if you do that you will have to run yet another set of benchmarks, though ;-)
<nikunj>
we will then have 2 sets of replicates with 2 variations each (normal and validate)
<nikunj>
hkaiser, I'm used to running benchmarks now ;-)
<hkaiser>
ok
<nikunj>
but I think the voting function is important
<hkaiser>
feel free to do that, then
<nikunj>
also, in the mean time should I run the benchmarks for the current benchmarks that I have over PR?
<hkaiser>
let's discuss the API before you start implementing things
<nikunj>
ok, then I'll wait till next thursday
<hkaiser>
I'd suggest you write the test first demonstrating how things should look like and only then start implementing things
<hkaiser>
we can discuss things over email or here befor ethat
<nikunj>
btw did you understand the bumps we were seeing for replicate?
<nikunj>
I was expecting almost similar times for all the error rates
<nikunj>
with a bit of overhead with ones throwing exceptions
<hkaiser>
nikunj: how many runs have you averaged there?
<nikunj>
it's just one
<nikunj>
and that too when I was about to leave
<hkaiser>
nod, that explains it
<nikunj>
I just wanted to get an idea of how things looked after I changed the code
<hkaiser>
but I would expect constant execution times, indeed
<nikunj>
yes, that is what I expected too
<nikunj>
I guess we'll have to play with execution times to get a better picture
<nikunj>
besides I was reading about the exponential distribution and it works on a probability distribution
<nikunj>
P(x|y) = y*exp(-x*y)
<nikunj>
so what we do there is set y for the probability distribution, where y can be a set time after which a certain event is meant to occur
<nikunj>
and then we get a function where we can identify the probability of occurrence of x given the value of y
<nikunj>
so basically if we plug in y = 3 and x = 3 (that's what my benchmark is doing), the probability of occurrence of a number greater than 3 in a run with about 10000 threads is about 4 only
<nikunj>
and the reason why we get the exponential curve for the graph is because there is an exponential relation between x and the probability function, so if we decide the error generation as 2 and put x = 2 we will have exponentially more
<nikunj>
since it will then be 2*exp(-4) compared to 3*exp(-9
<nikunj>
that's why we see an exponential time difference in executions
<hkaiser>
nod
<nikunj>
now what we could do is that we could allow the user to add an average rate of occurrence of some event
<nikunj>
and then control the x ourselves
<nikunj>
or keep the x same as the one that user enters i.e. the average rate of occurrence of some event
<nikunj>
if we control x ourselves then we can decide on the number of failing threads, else it will be probable to error_rate*exp(-error_rate^2)
<nikunj>
we could do it either way you want
<nikunj>
the math behind it is pretty simple and I ran an alternative program to test if my thinking was correct
<nikunj>
so what should I go ahead with?
<hkaiser>
we need to control the average time between errors
<nikunj>
then what we could do is to keep x = 1 so that the user will know that the probability of failure will be equal to exp(-error_rate)
<nikunj>
this way for an error_rate specified as y, the probability of a number generated being greater than 1 will be exp(-y)
<hkaiser>
right
<nikunj>
so we will have a very clear function translation
<nikunj>
and the user should not have any issues identifying with exponential probabilities
<nikunj>
alright then, I will make this change
<nikunj>
would you want me to benchmark once I've changed that?
<nikunj>
or is there something else you should look into first?
<hkaiser>
well, we talked about the parameters to change for the benchmarks
<hkaiser>
let's see what we get
<nikunj>
alright, I'll run for the parameters we discussed and generate some graphs
<nikunj>
I'll mail you those graphs I generate and we can pick from there
rishabh_bansal11 has quit [Quit: Connection closed for inactivity]
<nikunj>
hkaiser, would you want execution times in microseconds?
<nikunj>
or should I keep them in milliseconds?
<hkaiser>
you mean the artifical delay?
<nikunj>
yes the artificial time for any thread execution
<hkaiser>
I think we should run with 5us, 50us, 500us, and 5ms
<nikunj>
alright, I'll run it with those, I'll add another 2ms in between just for a better comparision
<hkaiser>
sure, feel free to add that
<nikunj>
done
<nikunj>
I'll start the benchmarks now
<nikunj>
one final thing, would you want me to keep the n-value high so that all tests pass? or keep it low and let some tests fail
<nikunj>
coz we will see tests failing if n is low and error-rate is set to pretty low (which translates to high amount of errors)
<nikunj>
on the downside, keeping n-value high will make replicate run for longer too
<nikunj>
but I encountered multiple failing instances when I kept n-value low for replicate
<nikunj>
hkaiser, I've started a background script to run about with a variety of parameters repeating 20 times for every parameter there is. So we should now have a comprehensive view. Also, marvin will be blocked for this weekend most likely coz there are a total of 20k+ runs ;)
<nikunj>
till it completes running I'll quickly go through the reports you sent over, and start going through phylanx seminars
<nikunj>
total runs equals 61440
<nikunj>
that will easily take this weekend
nikunj has quit [Ping timeout: 246 seconds]
K-ballo has quit [Quit: K-ballo]
hkaiser has quit [Quit: bye]
jaafar has joined #ste||ar
nikunj97 has joined #ste||ar
nikunj97 has quit [Remote host closed the connection]
nikunj97 has joined #ste||ar
RostamLog has quit [Ping timeout: 258 seconds]
simbergm has quit [Remote host closed the connection]