aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
Smasher has quit [Quit: 2ProShells BNC Powering down]
Smasher has joined #ste||ar
<jbjnr> ouch - during preprocess chunking is disabled,
<jbjnr> so the size is computed for an unoptimized serialization
<hkaiser> jbjnr: there you go
<jbjnr> but why?
<jbjnr> can there really be so many bugs?
<jbjnr> anton said the serialization was worse ....
<hkaiser> worse in what sense?
<hkaiser> jbjnr: so far we didn't care what we calculate for the required size of the serialization buffer as long as it was large enough
<hkaiser> so it's not really a bug, I guess
<jbjnr> hkaiser: worse in the sense that tests he ran in the past were better than ones recently
<hkaiser> jbjnr: I fixed that
<jbjnr> I am dumping out stuff and the flags are set correctly, so there's a bug somewhere in the checking ....
<jbjnr> I'd better sleep and fix this tomorrow. thanks for the help.
denis_blank has quit [Quit: denis_blank]
<diehlpk> hkaiser, Is there any attempt to write a book about hpx?
<diehlpk> I was thinking about the handbook of hpx, where different people contribute with chapters.
<hkaiser> diehlpk: heh
<hkaiser> diehlpk: heller_ talked about this - half jokingly
<hkaiser> other than taht, no attempts being mde
<diehlpk> Ok, if we are interested we could start to collect ideas for chapters.
<hkaiser> absolutely!
<diehlpk> should I send a mail at our mailinglist?
<hkaiser> :D
<hkaiser> stellar-internal?
<hkaiser> might be even better to make it a personal email to people you think would be interested
<diehlpk> Ok, so I think you, heller, adrian?
<hkaiser> John? Bryce? Pat? Kevin?
<diehlpk> Ok, I will ask them
<hkaiser> ok
<hkaiser> thanks!
<diehlpk> You are welcome.
<diehlpk> Mail was sent righ now
K-ballo has quit [Quit: K-ballo]
hkaiser has quit [Quit: bye]
ajaivgeorge has quit [Ping timeout: 255 seconds]
diehlpk has quit [Ping timeout: 240 seconds]
pree_ has joined #ste||ar
pree_ has quit [Remote host closed the connection]
hkaiser has joined #ste||ar
hkaiser has quit [Client Quit]
pree has joined #ste||ar
pree has quit [Quit: AaBbCc]
<taeguk> Excuse me, I have a question.
<taeguk> It seems that checking if an execution policy is vectorpack execution policy is omitted now in https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/parallel/util/loop.hpp#L495-L496.
<taeguk> I don't know vectorpack execution policy exactly. But I have found the above.
<taeguk> Is there anyone to tell me this?
jaafar has quit [Ping timeout: 245 seconds]
Smasher has quit [*.net *.split]
wash[m] has quit [*.net *.split]
Smasher has joined #ste||ar
wash[m] has joined #ste||ar
shoshijak has joined #ste||ar
bikineev has joined #ste||ar
shoshijak has quit [Ping timeout: 255 seconds]
david_pf_ has joined #ste||ar
shoshijak has joined #ste||ar
bikineev has quit [Remote host closed the connection]
bikineev has joined #ste||ar
bikineev has quit [Remote host closed the connection]
shoshijak has quit [Ping timeout: 255 seconds]
bikineev has joined #ste||ar
<jbjnr> taeguk I think you'll find that support for vectorized types is incomplete, so if you find suppport in one place, but not in another, it is just because it hasn't been implemented or tested yet
<jbjnr> bikineev: yt?
shoshijak has joined #ste||ar
bikineev has quit [Ping timeout: 268 seconds]
<heller_> jbjnr: hmm, the chunks should get forwarded properly
<heller_> The flags are taken from the parcelport
<heller_> And utterly broken is a small overstatement...
<jbjnr> during preprocess, the chunker is null and so all the optimizations are disabled
<jbjnr> leaving that check out doesn't seem to change anything, but now the size is correct
<jbjnr> but I am troubled still by https://github.com/STEllAR-GROUP/hpx/blob/c24d02101fe6c9a58a5d46d1781b1c481c4e0d48/hpx/runtime/serialization/output_archive.hpp#L360 I think it should only add count bytes if optimizations are off
<heller_> jbjnr: ok, I'll have a look
<heller_> jbjnr: Hartmut changed that
<jbjnr> and for my rma stuff it works fine.
<heller_> Sounds good
<jbjnr> (and saves a lot of unnecessary allocation when doing osu latency with large sizes)
<heller_> The first line should check if we have chunking enabled
<jbjnr> etc etc
<heller_> Sure
<heller_> It used to work :/
<jbjnr> I'm going over a lot of the code now anywy
<jbjnr> I have had a disaster
<heller_> How so?
<jbjnr> My thesis extension request was denied by the faculaty, so I'm now 3 months past my deadline and I have to submit immediately, so I need to get this code written up and submit a paper on friday
<jbjnr> The dept approved it in jan, but the faculty have only now rejected it - 3 months after the deadline passed'
<heller_> Yay...
<jbjnr> no.
<jbjnr> deadlin end of week. onkly place I can find with a fast turnaround
<jbjnr> so I'll have a=to write a shit paper for that and we can do a better one for IPDPS or something later
<heller_> Here is the problem... I'm not helpful this week...
<jbjnr> It can't be a good conf because they only have microsoft word templates for submission :(
<heller_> EU Review on Wednesday
<jbjnr> heller_: do not worry. I can write it all on behalf of us 4
<heller_> Lol
<heller_> Ok
<jbjnr> I only need some feedback from anton about the serialization benchmarks, they don't compile properly for me
<jbjnr> once i fix that, I can run graphs etc myself
<heller_> Great
<jbjnr> (The only thing I do not fully understand is how action types are serialized)
<heller_> I'll be at the airport at around 4
<jbjnr> but I don't really want to write about that anyway'
<heller_> And hope to finish my slides during the flight
<jbjnr> just a mention of it is enough really
<heller_> So we can run through stuff at night
<jbjnr> have a safe journey
<heller_> If that's ok
<heller_> Thanks
Matombo has joined #ste||ar
bikineev has joined #ste||ar
david_pf_ has quit [Ping timeout: 240 seconds]
bikineev has quit [Ping timeout: 260 seconds]
K-ballo has joined #ste||ar
hkaiser has joined #ste||ar
bikineev has joined #ste||ar
denis_blank has joined #ste||ar
shoshijak has quit [Ping timeout: 255 seconds]
josef__k has joined #ste||ar
shoshijak has joined #ste||ar
josef__k has quit []
<heller_> hkaiser: the merging of future awaiting and size calculation is an optimization. saves a serialization pass
josef__k has joined #ste||ar
<hkaiser> heller_: sure
<hkaiser> first correctness, then performance, though
ajaivgeorge has joined #ste||ar
<hkaiser> jbjnr: I should be able to help this week
<heller_> hkaiser: the chunking not working properly anymore is on you though ;)
<hkaiser> why is chunking not working properly?
<heller_> [10:53:29] <jbjnr> heller_: I fixed the size calculation error by commenting out this https://github.com/STEllAR-GROUP/hpx/blob/c24d02101fe6c9a58a5d46d1781b1c481c4e0d48/hpx/runtime/serialization/output_archive.hpp#L106
<heller_> [10:53:46] <jbjnr> during preprocess, the chunker is null and so all the optimizations are disabled
<heller_> [10:54:12] <jbjnr> leaving that check out doesn't seem to change anything, but now the size is correct
<hkaiser> why is that not correct?
<heller_> gtg, will be back in about an hour
<josef__k> OK, my test program compiles, runs, computes the correct result, but is not parallelized. :\ I will pastebin some code shortly, but when adapting from std::inner_product to hpx::parallel::transform_reduce, do I need to return futures from my binary operators?
<hkaiser> heller_: shrug, do we have a test enforcing this behaviour you claim I have broken?
<hkaiser> josef__k: have you passed any command line options?
<josef__k> hkaiser: No.
<hkaiser> nod
<heller_> hkaiser: it's not my claim
<hkaiser> josef__k: HPX runs on one core if not instructed otherwise
<josef__k> hkaiser: Ohhh. :)
<hkaiser> heller_: shrug
<hkaiser> josef__k: use the command line option --hpx:threads=N (-tN)
shoshijak has quit [Ping timeout: 255 seconds]
<heller_> hkaiser: let's just create a test case, tbh, I have no idea what's broken
<hkaiser> me neither
<K-ballo> heller_: I talked to EricWF and he confirmed recentish versions of libc++ support exception_ptr on windows too
<K-ballo> we could switch to <exception> directly, and only introduce a compat layer for it if we hit a platform without the required compiler support/library integration
<josef__k> Ah, the mysteries of parallelization; it's slower when threads=#CPUs vs single-threaded.
<hkaiser> josef__k: lol
<hkaiser> josef__k: happens to all of us
<hkaiser> try using lesser number of cores
<josef__k> Granted, the benchmark library may be interfering.
<hkaiser> also, recompile HPX using cmake -DHPX_WITH_THREAD_IDLE_RATES=On ...
<hkaiser> josef__k: that will enable a performance counter which is very useful in analysing parallelization
<josef__k> It's only somewhat slower, about 20%. For similar code when I tried OpenMP, it became 100% slower.
<hkaiser> josef__k: but it shouldn't be _slower_
<josef__k> Mmmm, yes.
<josef__k> The code repository is here if you're interested: https://github.com/jeremy-murphy/programming in the statistics.hpp file. It's just a whole lot of variations on the same algorithm.
<hkaiser> josef__k: try recompiling hpx with that flag ^^
<josef__k> hkaiser: Is that recompiling the HPX library or just my code?
<hkaiser> that will enable a performance count we call idle-rate - a very nice overarching measure of how well the application is parallelized
<hkaiser> hpx library
<hkaiser> this counter adds some overhead so we disable it by default
<josef__k> OK.
<josef__k> I will add that flag to the HPX library another time, just not right now.
<hkaiser> as you wish
<josef__k> It's bed time here. :)
<josef__k> Thanks for your help again though.
shoshijak has joined #ste||ar
ajaivgeorge has quit [Ping timeout: 246 seconds]
bikineev has quit [Read error: No route to host]
bikineev has joined #ste||ar
<jbjnr> hkaiser: (+heller fyi) it's not so much that the serialization is broken, but during the preprocess pass, the size is calculated - with the chunker set to null, the optimizations are discarded, so the preprocess pass returns a size that assumes all vector/rma chunks are added to the buffer directly and not chunked, so the size is larger than it needs to be. for the vector version, it's not a...
<jbjnr> ...big deal (slight wast of a malloc), but for the rma version, the memory must be taken from the pinned pool, or registered on the fly and it costs then.
<jbjnr> I've made a better fix anyway, that I'll do a PR for on it's own.
<jbjnr> sorry. two fixes, one is just commenting out the chunker = nulptr check, the other is the size calculation of the buffer - which I've improved
ajaivgeorge has joined #ste||ar
<heller_> jbjnr: a test for that would be great
<jbjnr> heller_: I'll see what I can do
<hkaiser> the chunker == nullptr check makes Anton's serialization tests run properly, otherwise they just segfault
<jbjnr> ooh
<jbjnr> that's bad
<jbjnr> was it added recently?
<jbjnr> hkaiser: ^
josef__k has quit [Ping timeout: 240 seconds]
<jbjnr> I'll see if I can find a less destructive fix anyway. I was going to test anton's serialization stuff in any case
<hkaiser> yes, I added that change to fix Anton's tests
<hkaiser> we need to agree on how we want for the archive to be used and adapt all the code using it appropriately
<jbjnr> yes, I suspect anton's test is doing something slightly unexpected ...
<jbjnr> one nice thing would be to add an assert that the container size matches the archive size - but during preprocess a dummy container is used, so it is tricky
<jbjnr> I'm just trying to track down an unreated rma bug, then I'll come back to this.
diehlpk_work has joined #ste||ar
<hkaiser> jbjnr: ok, sounds good - let's talk this out though before you change things
<heller_> during preprocessing, the chunking flag should have been set explicitly
<heller_> if enabled in the PP that is
<jbjnr> the flags are set correctly, but the null chunker check causes the optimizations to be disabled
<jbjnr> hence the mismatch
<heller_> yeah
<jbjnr> I only noticed it because my rma PP logging flagged allocations that were 'unexpected'
<heller_> good catch
<jbjnr> (and because my new rma serialization test was slower than expected)
<hkaiser> jbjnr: doesn't it make sense?
<hkaiser> if there is no chunker, then the optimizations have to be disabled
eschnett has quit [Quit: eschnett]
<hkaiser> instead of allowing for the parameters to be mismatched I'd rather we supplied a dummy chunker
<heller_> or chunk already during preprocessing
<heller_> IIRC, the preprocessing already calculates the number of chunks needed
<hkaiser> sorry gtg now
hkaiser has quit [Quit: bye]
aserio has joined #ste||ar
<github> [hpx] sithhell pushed 1 new commit to lf_multiple_parcels: https://git.io/vHVYx
<github> hpx/lf_multiple_parcels 5ac0d87 Thomas Heller: Merge branch 'master' into lf_multiple_parcels...
<heller_> jbjnr: ^^ this should be ready now
ajaivgeorge has quit [Read error: Connection reset by peer]
<aserio> heller_: are you interested in doing the Journal paper?
<aserio> heller_: also, how is writing going?
<heller_> aserio: is the journal publishing my thesis?
<heller_> aserio: I have a EU review on wednesday, no time to write till then :(
<aserio> heller_: that is for your to determine, I am simply trying to put a meeting together :)
ajaivgeorge has joined #ste||ar
<heller_> aserio: tell me more
<aserio> I don't know what and EU review is, but it doesn't sound good
<aserio> heller_: This is the journal paper that comes from Patrick's publication in ParNum
<heller_> ahh, that one
<aserio> heller_: if this is in German I will come though your machine to pop you
<heller_> aserio: the european comission is doing reviews of the projects they fund after 18 months to see if their money is well spent
<heller_> aserio: learn another language you ignorant fool ;)
<aserio> ah, is it just a report or do you have a physical reviewer
<aserio> :p make me
<heller_> physical reviews
<heller_> we have a a whole day meeting where we present the projects progress on wednesday
<aserio> ewww
<heller_> in brussels
<heller_> exactly
<aserio> I liked the city well enough though
<heller_> i like the city as well
<heller_> and the review makes sense
<heller_> it's just a lot of effort
<github> [hpx] hkaiser pushed 1 new commit to serialization_access_data: https://git.io/vHVGA
<github> hpx/serialization_access_data 953e158 Hartmut Kaiser: Remove superfluous 'return'
hkaiser has joined #ste||ar
bikineev has quit [Ping timeout: 255 seconds]
eschnett has joined #ste||ar
mcopik has quit [Ping timeout: 255 seconds]
<heller_> hkaiser: #2619 should be ready now, btw
<hkaiser> heller_: ok, thanks
Matombo has quit [Remote host closed the connection]
jaafar has joined #ste||ar
EverYoung has joined #ste||ar
hkaiser has quit [Quit: bye]
bikineev has joined #ste||ar
david_pf_ has joined #ste||ar
hkaiser has joined #ste||ar
mcopik has joined #ste||ar
<github> [hpx] hkaiser force-pushed fixing_2667 from 887cc42 to 4a3a5c1: https://git.io/vHueR
<github> hpx/fixing_2667 4a3a5c1 Hartmut Kaiser: Inhibit direct conversion from future<future<T>> --> future<void>...
<github> [hpx] hkaiser pushed 1 new commit to master: https://git.io/vHVaT
<github> hpx/master 67870fc Hartmut Kaiser: Merge pull request #2672 from STEllAR-GROUP/invoke...
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 240 seconds]
aserio1 is now known as aserio
EverYoung has quit [Ping timeout: 246 seconds]
EverYoung has joined #ste||ar
Matombo has joined #ste||ar
<diehlpk_work> heller_, hkaiser What about the skype meeting for the HPXCL paper?
ajaivgeorge has quit [Quit: ajaivgeorge]
mcopik has quit [Ping timeout: 255 seconds]
pree has joined #ste||ar
bikineev has quit [Remote host closed the connection]
david_pf_ has quit [Quit: david_pf_]
david_pf_ has joined #ste||ar
david_pf_ has quit [Client Quit]
bikineev has joined #ste||ar
bikineev has quit [Remote host closed the connection]
bikineev has joined #ste||ar
<github> [hpx] K-ballo force-pushed compat-exception from d778392 to b574c11: https://git.io/vH8FM
<github> hpx/compat-exception 3aade6e Agustin K-ballo Berge: Add compatibility layer for std::exception_ptr
<github> hpx/compat-exception a3b0486 Agustin K-ballo Berge: Add inspect checks for deprecated boost::exception_ptr
<github> hpx/compat-exception b574c11 Agustin K-ballo Berge: Remove compatibility layer for std::exception_ptr, mark support as required
<K-ballo> heller_: I'll leave the compat layer as part of the history ^ so that it's there if we ever happen to need it
<heller_> K-ballo: thanks
bikineev has quit [Remote host closed the connection]
mcopik has joined #ste||ar
pree has quit [Quit: AaBbCc]
aserio has quit [Ping timeout: 255 seconds]
bikineev has joined #ste||ar
mcopik has quit [Ping timeout: 255 seconds]
aserio has joined #ste||ar
hkaiser has quit [Quit: bye]
denis_blank has quit [Quit: denis_blank]
mcopik has joined #ste||ar
hkaiser has joined #ste||ar
<aserio> hkaiser: yt?
<hkaiser> aserio: here
<aserio> hkaiser: So Dominic and I talked and it looks like he has a race condition
<hkaiser> did he say whether my fixes solved his problems?
<aserio> The future of future, future void is on a branch right?
<aserio> It looks like that the changes reduce the frequency of the issue
<hkaiser> yes
<aserio> I told him that we should brain storm some ways of searching for the error over the next few days
<hkaiser> ok
aserio has quit [Ping timeout: 246 seconds]
eschnett has quit [Quit: eschnett]
aserio has joined #ste||ar
aserio has quit [Quit: aserio]
aserio has joined #ste||ar
eschnett has joined #ste||ar
aserio has quit [Quit: aserio]
<jbjnr> hkaiser: yt?
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
<hkaiser> jbjnr: here
<jbjnr> hkaiser: when we serialize our args into a parcel, the rma_vector can create a chunk of type 2 (an rma chunk) and store the pinned memory info, which is picked up by the parcelport and used for rma operations. all is well, but ...
<jbjnr> when the data is received into a new rma chunk, and handed over to the archive to be deserialized - I don't have access to the chunk info directly
<jbjnr> how can I access the chunk structure when I am reading my rma object out of the archive?
<hkaiser> shouldn't the chunker do that?
<jbjnr> if I can do that, then I can access the memory region handle I put in there
<jbjnr> how can I get the chunker then?
<hkaiser> let me look
<hkaiser> how do you create the special chunk during serialization?
<jbjnr> I only see functions like " void load_binary_chunk(void* address, std::size_t count) // override" in input_container - but no chunk info is there
<jbjnr> when storing, save_binary_chunk actually stores the chunk directly, but not when loadig
<hkaiser> could you point me to the code your'e looking at, pls?
<jbjnr> so I create a new kind of chunk with the rma info in it, and the parcelport bypasses memory registration. works great
<jbjnr> but the inverse operation, I do not see the chunks
<hkaiser> ok, so you changed the output archive to call a new function which is creating the rma chunk
<jbjnr> yes
<jbjnr> for rma types
<jbjnr> it's a new specialization/overlaod set
<hkaiser> nod
<hkaiser> for special types you call save_rma_chunk instead of save_binary_chunk
<jbjnr> yes
<hkaiser> that means you should do the same on receive
<hkaiser> for the same special types you call load_rma_chunk instead of load_binary_chunk
<hkaiser> shouldn't that do the trick?
<jbjnr> yes, but I only get the pointer and size, and not the extra stuff I stored in the chunk
<jbjnr> the rma handles etc
<jbjnr> I need these for memory managment'
<hkaiser> why do you see only the pointer and the size?
<hkaiser> shouldn't the sending/receival of the chunks pass your additional information along?
<jbjnr> the chunker has the received rma data (and receive handle, which is not the same as the sent on on the other end), but I don't quite know how to get the chunk handle stuff from inside load_rma_chunk
<github> [hpx] K-ballo force-pushed compat-exception from b574c11 to 60ac848: https://git.io/vH8FM
<github> hpx/compat-exception f65b0be Agustin K-ballo Berge: Add inspect checks for deprecated boost::exception_ptr
<github> hpx/compat-exception 60ac848 Agustin K-ballo Berge: Remove compatibility layer for std::exception_ptr, mark support as required
<hkaiser> you somehow identify the chunk as being an rma chunk, yes?
<hkaiser> or is it just the serialized (special) type which knows that?
<jbjnr> yes, they are received as type 2 rma chunks, but I didn't modify the decode parcels to handle them.
<jbjnr> thanks
<jbjnr> that should be my next place to fix it
<hkaiser> encode_chunks needs this as well, probably
<hkaiser> but you have stuff here already
<jbjnr> nothing special needed to be done on send, apart from the new chunk type - the serialization part allows me to create the chunks directly
<hkaiser> right
<jbjnr> it's the decoding I need to look at
<hkaiser> you need to do the proper memory handling on the receive end, though
<jbjnr> I will poke around there. once I find the right place, we have it fixed.
<hkaiser> cool
<jbjnr> it's all working, but I have leaks due to the handle mismatches
<jbjnr> I have to write this paper by friday :(
<hkaiser> jbjnr: the allocation of the chunks is probably done in the parcelports, not in decode_chunks
<hkaiser> jbjnr: let me know if I can help in any way
<jbjnr> yes. The chunks are created corrcly by the libfabric receiver, but then they are passed into decode_parcels
<jbjnr> and then I lost them
<jbjnr> so I must make changes in there to get them passed into load_rma_chunk for the rma type deserialization
<hkaiser> well, the chunks you see in the decode_chunks function are the ones allocated in the pp
<jbjnr> yes
<jbjnr> I thnk I know what to do now
<jbjnr> ta
<hkaiser> load_rms_chunk is called from the de-serialization of your rms data type
<jbjnr> yes - but getting the chunk itself it the part I miss
<hkaiser> you just 'assume' that the chunk you're about to deserialize from was correctly allocated
<jbjnr> I need the actual chunk though
<jbjnr> it has the handles I need
<jbjnr> and load_rma_chunk only sees the archive/container
<jbjnr> not the chunker
<hkaiser> I don't think you need to do anything special in addition to just accessing its pointer
<jbjnr> shit. I forgot I did that code.
<jbjnr> yes, the chunk is present there.
<jbjnr> excellent.
<hkaiser> you just have not accessed the pointer yet
<jbjnr> thanks. I will fix the last bit now
shoshijak has quit [Ping timeout: 255 seconds]
Matombo has quit [Ping timeout: 260 seconds]
Matombo has joined #ste||ar
Matombo has quit [Remote host closed the connection]
EverYoun_ has joined #ste||ar
EverYou__ has joined #ste||ar
EverYoung has quit [Ping timeout: 255 seconds]
EverYou__ has quit [Remote host closed the connection]
EverYoun_ has quit [Ping timeout: 246 seconds]
EverYoung has joined #ste||ar
diehlpk has joined #ste||ar
bikineev has quit [Remote host closed the connection]