aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
Smasher has quit [Quit: 2ProShells BNC Powering down]
Smasher has joined #ste||ar
<jbjnr>
ouch - during preprocess chunking is disabled,
<jbjnr>
so the size is computed for an unoptimized serialization
<hkaiser>
jbjnr: there you go
<jbjnr>
but why?
<jbjnr>
can there really be so many bugs?
<jbjnr>
anton said the serialization was worse ....
<hkaiser>
worse in what sense?
<hkaiser>
jbjnr: so far we didn't care what we calculate for the required size of the serialization buffer as long as it was large enough
<hkaiser>
so it's not really a bug, I guess
<jbjnr>
hkaiser: worse in the sense that tests he ran in the past were better than ones recently
<hkaiser>
jbjnr: I fixed that
<jbjnr>
I am dumping out stuff and the flags are set correctly, so there's a bug somewhere in the checking ....
<jbjnr>
I'd better sleep and fix this tomorrow. thanks for the help.
denis_blank has quit [Quit: denis_blank]
<diehlpk>
hkaiser, Is there any attempt to write a book about hpx?
<diehlpk>
I was thinking about the handbook of hpx, where different people contribute with chapters.
<hkaiser>
diehlpk: heh
<hkaiser>
diehlpk: heller_ talked about this - half jokingly
<hkaiser>
other than taht, no attempts being mde
<diehlpk>
Ok, if we are interested we could start to collect ideas for chapters.
<hkaiser>
absolutely!
<diehlpk>
should I send a mail at our mailinglist?
<hkaiser>
:D
<hkaiser>
stellar-internal?
<hkaiser>
might be even better to make it a personal email to people you think would be interested
<diehlpk>
Ok, so I think you, heller, adrian?
<hkaiser>
John? Bryce? Pat? Kevin?
<diehlpk>
Ok, I will ask them
<hkaiser>
ok
<hkaiser>
thanks!
<diehlpk>
You are welcome.
<diehlpk>
Mail was sent righ now
K-ballo has quit [Quit: K-ballo]
hkaiser has quit [Quit: bye]
ajaivgeorge has quit [Ping timeout: 255 seconds]
diehlpk has quit [Ping timeout: 240 seconds]
pree_ has joined #ste||ar
pree_ has quit [Remote host closed the connection]
<taeguk>
I don't know vectorpack execution policy exactly. But I have found the above.
<taeguk>
Is there anyone to tell me this?
jaafar has quit [Ping timeout: 245 seconds]
Smasher has quit [*.net *.split]
wash[m] has quit [*.net *.split]
Smasher has joined #ste||ar
wash[m] has joined #ste||ar
shoshijak has joined #ste||ar
bikineev has joined #ste||ar
shoshijak has quit [Ping timeout: 255 seconds]
david_pf_ has joined #ste||ar
shoshijak has joined #ste||ar
bikineev has quit [Remote host closed the connection]
bikineev has joined #ste||ar
bikineev has quit [Remote host closed the connection]
shoshijak has quit [Ping timeout: 255 seconds]
bikineev has joined #ste||ar
<jbjnr>
taeguk I think you'll find that support for vectorized types is incomplete, so if you find suppport in one place, but not in another, it is just because it hasn't been implemented or tested yet
<jbjnr>
bikineev: yt?
shoshijak has joined #ste||ar
bikineev has quit [Ping timeout: 268 seconds]
<heller_>
jbjnr: hmm, the chunks should get forwarded properly
<heller_>
The flags are taken from the parcelport
<heller_>
And utterly broken is a small overstatement...
<jbjnr>
(and saves a lot of unnecessary allocation when doing osu latency with large sizes)
<heller_>
The first line should check if we have chunking enabled
<jbjnr>
etc etc
<heller_>
Sure
<heller_>
It used to work :/
<jbjnr>
I'm going over a lot of the code now anywy
<jbjnr>
I have had a disaster
<heller_>
How so?
<jbjnr>
My thesis extension request was denied by the faculaty, so I'm now 3 months past my deadline and I have to submit immediately, so I need to get this code written up and submit a paper on friday
<jbjnr>
The dept approved it in jan, but the faculty have only now rejected it - 3 months after the deadline passed'
<heller_>
[10:53:46] <jbjnr> during preprocess, the chunker is null and so all the optimizations are disabled
<heller_>
[10:54:12] <jbjnr> leaving that check out doesn't seem to change anything, but now the size is correct
<hkaiser>
why is that not correct?
<heller_>
gtg, will be back in about an hour
<josef__k>
OK, my test program compiles, runs, computes the correct result, but is not parallelized. :\ I will pastebin some code shortly, but when adapting from std::inner_product to hpx::parallel::transform_reduce, do I need to return futures from my binary operators?
<hkaiser>
heller_: shrug, do we have a test enforcing this behaviour you claim I have broken?
<hkaiser>
josef__k: have you passed any command line options?
<josef__k>
hkaiser: No.
<hkaiser>
nod
<heller_>
hkaiser: it's not my claim
<hkaiser>
josef__k: HPX runs on one core if not instructed otherwise
<josef__k>
hkaiser: Ohhh. :)
<hkaiser>
heller_: shrug
<hkaiser>
josef__k: use the command line option --hpx:threads=N (-tN)
shoshijak has quit [Ping timeout: 255 seconds]
<heller_>
hkaiser: let's just create a test case, tbh, I have no idea what's broken
<hkaiser>
me neither
<K-ballo>
heller_: I talked to EricWF and he confirmed recentish versions of libc++ support exception_ptr on windows too
<K-ballo>
we could switch to <exception> directly, and only introduce a compat layer for it if we hit a platform without the required compiler support/library integration
<josef__k>
Ah, the mysteries of parallelization; it's slower when threads=#CPUs vs single-threaded.
<hkaiser>
josef__k: lol
<hkaiser>
josef__k: happens to all of us
<hkaiser>
try using lesser number of cores
<josef__k>
Granted, the benchmark library may be interfering.
<hkaiser>
also, recompile HPX using cmake -DHPX_WITH_THREAD_IDLE_RATES=On ...
<hkaiser>
josef__k: that will enable a performance counter which is very useful in analysing parallelization
<josef__k>
It's only somewhat slower, about 20%. For similar code when I tried OpenMP, it became 100% slower.
<hkaiser>
josef__k: but it shouldn't be _slower_
<josef__k>
Mmmm, yes.
<josef__k>
The code repository is here if you're interested: https://github.com/jeremy-murphy/programming in the statistics.hpp file. It's just a whole lot of variations on the same algorithm.
<hkaiser>
josef__k: try recompiling hpx with that flag ^^
<josef__k>
hkaiser: Is that recompiling the HPX library or just my code?
<hkaiser>
that will enable a performance count we call idle-rate - a very nice overarching measure of how well the application is parallelized
<hkaiser>
hpx library
<hkaiser>
this counter adds some overhead so we disable it by default
<josef__k>
OK.
<josef__k>
I will add that flag to the HPX library another time, just not right now.
<hkaiser>
as you wish
<josef__k>
It's bed time here. :)
<josef__k>
Thanks for your help again though.
shoshijak has joined #ste||ar
ajaivgeorge has quit [Ping timeout: 246 seconds]
bikineev has quit [Read error: No route to host]
bikineev has joined #ste||ar
<jbjnr>
hkaiser: (+heller fyi) it's not so much that the serialization is broken, but during the preprocess pass, the size is calculated - with the chunker set to null, the optimizations are discarded, so the preprocess pass returns a size that assumes all vector/rma chunks are added to the buffer directly and not chunked, so the size is larger than it needs to be. for the vector version, it's not a...
<jbjnr>
...big deal (slight wast of a malloc), but for the rma version, the memory must be taken from the pinned pool, or registered on the fly and it costs then.
<jbjnr>
I've made a better fix anyway, that I'll do a PR for on it's own.
<jbjnr>
sorry. two fixes, one is just commenting out the chunker = nulptr check, the other is the size calculation of the buffer - which I've improved
ajaivgeorge has joined #ste||ar
<heller_>
jbjnr: a test for that would be great
<jbjnr>
heller_: I'll see what I can do
<hkaiser>
the chunker == nullptr check makes Anton's serialization tests run properly, otherwise they just segfault
<jbjnr>
ooh
<jbjnr>
that's bad
<jbjnr>
was it added recently?
<jbjnr>
hkaiser: ^
josef__k has quit [Ping timeout: 240 seconds]
<jbjnr>
I'll see if I can find a less destructive fix anyway. I was going to test anton's serialization stuff in any case
<hkaiser>
yes, I added that change to fix Anton's tests
<hkaiser>
we need to agree on how we want for the archive to be used and adapt all the code using it appropriately
<jbjnr>
yes, I suspect anton's test is doing something slightly unexpected ...
<jbjnr>
one nice thing would be to add an assert that the container size matches the archive size - but during preprocess a dummy container is used, so it is tricky
<jbjnr>
I'm just trying to track down an unreated rma bug, then I'll come back to this.
diehlpk_work has joined #ste||ar
<hkaiser>
jbjnr: ok, sounds good - let's talk this out though before you change things
<heller_>
during preprocessing, the chunking flag should have been set explicitly
<heller_>
if enabled in the PP that is
<jbjnr>
the flags are set correctly, but the null chunker check causes the optimizations to be disabled
<jbjnr>
hence the mismatch
<heller_>
yeah
<jbjnr>
I only noticed it because my rma PP logging flagged allocations that were 'unexpected'
<heller_>
good catch
<jbjnr>
(and because my new rma serialization test was slower than expected)
<hkaiser>
jbjnr: doesn't it make sense?
<hkaiser>
if there is no chunker, then the optimizations have to be disabled
eschnett has quit [Quit: eschnett]
<hkaiser>
instead of allowing for the parameters to be mismatched I'd rather we supplied a dummy chunker
<heller_>
or chunk already during preprocessing
<heller_>
IIRC, the preprocessing already calculates the number of chunks needed
<hkaiser>
sorry gtg now
hkaiser has quit [Quit: bye]
aserio has joined #ste||ar
<github>
[hpx] sithhell pushed 1 new commit to lf_multiple_parcels: https://git.io/vHVYx
<github>
hpx/lf_multiple_parcels 5ac0d87 Thomas Heller: Merge branch 'master' into lf_multiple_parcels...
<heller_>
jbjnr: ^^ this should be ready now
ajaivgeorge has quit [Read error: Connection reset by peer]
<aserio>
heller_: are you interested in doing the Journal paper?
<aserio>
heller_: also, how is writing going?
<heller_>
aserio: is the journal publishing my thesis?
<heller_>
aserio: I have a EU review on wednesday, no time to write till then :(
<aserio>
heller_: that is for your to determine, I am simply trying to put a meeting together :)
ajaivgeorge has joined #ste||ar
<heller_>
aserio: tell me more
<aserio>
I don't know what and EU review is, but it doesn't sound good
<aserio>
heller_: This is the journal paper that comes from Patrick's publication in ParNum
<heller_>
ahh, that one
<aserio>
heller_: if this is in German I will come though your machine to pop you
<heller_>
aserio: the european comission is doing reviews of the projects they fund after 18 months to see if their money is well spent
<heller_>
aserio: learn another language you ignorant fool ;)
<aserio>
ah, is it just a report or do you have a physical reviewer
<aserio>
:p make me
<heller_>
physical reviews
<heller_>
we have a a whole day meeting where we present the projects progress on wednesday
<aserio>
ewww
<heller_>
in brussels
<heller_>
exactly
<aserio>
I liked the city well enough though
<heller_>
i like the city as well
<heller_>
and the review makes sense
<heller_>
it's just a lot of effort
<github>
[hpx] hkaiser pushed 1 new commit to serialization_access_data: https://git.io/vHVGA
<github>
hpx/compat-exception b574c11 Agustin K-ballo Berge: Remove compatibility layer for std::exception_ptr, mark support as required
<K-ballo>
heller_: I'll leave the compat layer as part of the history ^ so that it's there if we ever happen to need it
<heller_>
K-ballo: thanks
bikineev has quit [Remote host closed the connection]
mcopik has joined #ste||ar
pree has quit [Quit: AaBbCc]
aserio has quit [Ping timeout: 255 seconds]
bikineev has joined #ste||ar
mcopik has quit [Ping timeout: 255 seconds]
aserio has joined #ste||ar
hkaiser has quit [Quit: bye]
denis_blank has quit [Quit: denis_blank]
mcopik has joined #ste||ar
hkaiser has joined #ste||ar
<aserio>
hkaiser: yt?
<hkaiser>
aserio: here
<aserio>
hkaiser: So Dominic and I talked and it looks like he has a race condition
<hkaiser>
did he say whether my fixes solved his problems?
<aserio>
The future of future, future void is on a branch right?
<aserio>
It looks like that the changes reduce the frequency of the issue
<hkaiser>
yes
<aserio>
I told him that we should brain storm some ways of searching for the error over the next few days
<hkaiser>
ok
aserio has quit [Ping timeout: 246 seconds]
eschnett has quit [Quit: eschnett]
aserio has joined #ste||ar
aserio has quit [Quit: aserio]
aserio has joined #ste||ar
eschnett has joined #ste||ar
aserio has quit [Quit: aserio]
<jbjnr>
hkaiser: yt?
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
<hkaiser>
jbjnr: here
<jbjnr>
hkaiser: when we serialize our args into a parcel, the rma_vector can create a chunk of type 2 (an rma chunk) and store the pinned memory info, which is picked up by the parcelport and used for rma operations. all is well, but ...
<jbjnr>
when the data is received into a new rma chunk, and handed over to the archive to be deserialized - I don't have access to the chunk info directly
<jbjnr>
how can I access the chunk structure when I am reading my rma object out of the archive?
<hkaiser>
shouldn't the chunker do that?
<jbjnr>
if I can do that, then I can access the memory region handle I put in there
<jbjnr>
how can I get the chunker then?
<hkaiser>
let me look
<hkaiser>
how do you create the special chunk during serialization?
<jbjnr>
I only see functions like " void load_binary_chunk(void* address, std::size_t count) // override" in input_container - but no chunk info is there
<jbjnr>
when storing, save_binary_chunk actually stores the chunk directly, but not when loadig
<hkaiser>
could you point me to the code your'e looking at, pls?
<jbjnr>
so I create a new kind of chunk with the rma info in it, and the parcelport bypasses memory registration. works great
<jbjnr>
but the inverse operation, I do not see the chunks
<hkaiser>
ok, so you changed the output archive to call a new function which is creating the rma chunk
<jbjnr>
yes
<jbjnr>
for rma types
<jbjnr>
it's a new specialization/overlaod set
<hkaiser>
nod
<hkaiser>
for special types you call save_rma_chunk instead of save_binary_chunk
<jbjnr>
yes
<hkaiser>
that means you should do the same on receive
<hkaiser>
for the same special types you call load_rma_chunk instead of load_binary_chunk
<hkaiser>
shouldn't that do the trick?
<jbjnr>
yes, but I only get the pointer and size, and not the extra stuff I stored in the chunk
<jbjnr>
the rma handles etc
<jbjnr>
I need these for memory managment'
<hkaiser>
why do you see only the pointer and the size?
<hkaiser>
shouldn't the sending/receival of the chunks pass your additional information along?
<jbjnr>
the chunker has the received rma data (and receive handle, which is not the same as the sent on on the other end), but I don't quite know how to get the chunk handle stuff from inside load_rma_chunk