hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020
nikunj97 has quit [Read error: Connection reset by peer]
weilewei has quit [Remote host closed the connection]
bita__ has joined #ste||ar
bita_ has quit [Ping timeout: 260 seconds]
weilewei has joined #ste||ar
hkaiser has quit [Quit: bye]
bita__ has quit [Ping timeout: 260 seconds]
weilewei has quit [Remote host closed the connection]
<zao> I've got some std-using code that has an mandatory heavy initialization step that I'm off-threading when constructing a thing, so that it might be ready when code needs it.
<zao> Right now I've got a mutex that the worker acquires when initializing the data and which any later callers need to acquire to use the data, also passing in a void future as a barrier to ensure that the worker has acquired the mutex.
<zao> Would it be more efficient if I instead use a future<void> to "guard" access to the data?
<zao> So the init code would fill the future when the data is filled, and consumers would just have to get() it before using the data.
<zao> Said initialization is prepopulating a hash map that is immutable after this initialization step, so consumers don't need mutual exclusion with each other.
<zao> I don't really know what the costs involved with a future is.
karame_ has quit [Remote host closed the connection]
<zao> (I should of course use HPX, but I'm not at that spot quite yet)
<heller1> zao: you can think of a future as something like this: condition variable + mutex with dynamic memory allocation and atomic reference counting
<heller1> FWIW, you would need a shared_future<void> to guard your initialization, which allows you to call `get` multiple times
<zao> Ah.
<heller1> the reoccurring calls to get have an overhead of one indirection + mutex lock/unlock
<heller1> in that ballpark, roughly
<zao> Maybe I could get the fastpath better by polling an atomic before going into the mutex path or something?
<zao> Again, no clue about relative costs here.
<heller1> yes, that could be done
<heller1> of course also depends on the implementation of the future ... in HPX, the fastpath of get is just an atomic read
<heller1> but absolutely depends on the usage pattern of your object(s)
<zao> The concrete application here is a virtual filesystem, in which I need to traverse the whole thing up-front to generate a complete lookup-table from child object to parent object. This takes something like 16 seconds, so I off-thread it.
<zao> It's used whenever I need to obtain the full path to an object, so not always used and typically for display purposes currently.
<zao> It's worked quite nicely in the Rust implementation of this codebase, but I'm porting it to C++ for experience.
<heller1> ;)
<heller1> how did you solve it in rust?
<heller1> if it is just display purposes, I guess the atomic + shared_future<void> thing is a nice way to go
<mdiers[m]> I need a connection to python in an hpx application, especially tensorflow. It is currently for a research project. Should I realize the interface with pybind11 or should I use Phylanx right away? Is Phylanx ready for productive environments? I have seen that the first release is out now.
<zao> In Rust I use a sharded reader-writer lock, which is biased toward faster reads: https://docs.rs/crossbeam/0.7.3/crossbeam/sync/struct.ShardedLock.html
<heller1> mdiers_: doesn't tensorflow have C++ bindings as well?
<heller1> mdiers_: phylanx doesn't give you the connection C++ to python. It gives you a python library which is using HPX. For a connection of C++ to Python, I think pybin dis the way to go
<heller1> pybind11
<heller1> (phylanx uses it too)
<heller1> on that note, I don't know about the production readyness of phylanx
<heller1> zao: that SharedLock is pretty neat. How do you deal with the situation where you see the has map not being initialized?
<zao> I block enough during construction so that the writer lock is always held before any readers get to see the object.
<heller1> icky
<zao> Yeah, a bit hacky :)
<mdiers[m]> <sithhell[m] "mdiers_: doesn't tensorflow have"> yes, but the current tensorflow part of the project is implemented in python and it will stay that way until it is finished. after that there will be a port.
<heller1> that's the kind of code which I have to debug nowadays which runs into all kinds of races and deadlocks because it was written 10 years ago with exactly those implicit assumptions
<heller1> mdiers_: in that case, I would probably write a quick binding in pybind
<zao> Don't tell anyone, but I actually had a bug where I didn't do that synchronization up-front and readers could sneak in if the init thread was delayed somehow :)
<heller1> :P
<mdiers[m]> <sithhell[m] "mdiers_: in that case, I would p"> thanks for the quick help
<heller1> mdiers_: unless you want to give phylanx a try though
<heller1> there was a hpx backend to tensorflow a while back: http://stellar.cct.lsu.edu/pubs/lukas_troska_hpx_tensorflow_04.05.17.pdf (it probably doesn't work anymore)
<heller1> phylanx is more or less pure python and I am not sure how well it integrates with 3rd party software
<mdiers[m]> <sithhell[m] "there was a hpx backend to tenso"> Yes, I saw it yesterday.
<heller1> in any case, I am sure the phylanx team would be eager to support you with the features you would need
<mdiers[m]> I can well imagine, are almost the same as here :-)
<mdiers[m]> i have another problem: a crash during the destruction of a static internal hpx object:
<mdiers[m]> `std::_Rb_tree<std::string, std::pair<std::string const, hpx::util::basic_any<void, void, void, std::integral_constant<bool, true>>>, std::_Select1st<std::pair<std::string const, hpx::util::basic_any<void, void, void, std::integral_constant<bool, true>>>>, std::less<std::string>, std::allocator<std::pair<std::string const, hpx::util::basic_any<void, void, void, std::integral_constant<bool, true>>>>>::_M_erase`
<mdiers[m]> I haven't had time to create a minimal example of this yet. maybe you have an idea?
<zao> Ho ho... tried to vcpkg install HPX... Additional packages (*) will be modified to complete this operation. Starting package 1/102: boost-vcpkg-helpers:x64-windows
<heller1> zao: good luck
<heller1> mdiers_: interesting. Doesn't ring a bell
<mdiers[m]> <sithhell[m] "mdiers_: interesting. Doesn't ri"> ok, thanks. is also an untypical context. hpx-application integrated via a shared library, loaded at runtime, only one function is called without hpx, and then again an unload.
<heller1> oh, interesting usecase...
<heller1> if you have a stacktrace, I can have a look...
<heller1> but it looks like this is related to our own plugin loading mechanism
<heller1> mdiers_: does this also happen when you remove all files from lib/hpx/ in your build/install directory?
<zao> Bleh, can't say -HPX_WITH_CXX20=ON yet on MSVC it seems. Not quite sure what explodes yet, seems to be Boost.
<heller1> hmm
<zao> truncated log after the first bunch of failed projects, whole one is a bit hugs.
Guest8932 has quit [*.net *.split]
gdaiss[m] has quit [*.net *.split]
diehlpk_mobile[m has quit [*.net *.split]
Guest8932 has joined #ste||ar
diehlpk_mobile[m has joined #ste||ar
gdaiss[m] has joined #ste||ar
mdiers_ has quit [Quit: mdiers_]
mcopik has joined #ste||ar
mcopik has quit [Client Quit]
diehlpk_mobile[m has quit [Quit: killed]
freifrau_von_ble has quit [Quit: killed]
kordejong has quit [Quit: killed]
mdiers[m] has quit [Quit: killed]
jbjnr has quit [Quit: killed]
heller1 has quit [Quit: killed]
tiagofg[m] has quit [Quit: killed]
rori has quit [Quit: killed]
ms[m] has quit [Quit: killed]
pfluegdk[m] has quit [Quit: killed]
gdaiss[m] has quit [Quit: killed]
Guest8932 has quit [Quit: killed]
kordejong has joined #ste||ar
parsa[m] has joined #ste||ar
parsa[m] is now known as Guest52957
tiagofg[m] has joined #ste||ar
pfluegdk[m] has joined #ste||ar
mdiers[m] has joined #ste||ar
jbjnr has joined #ste||ar
diehlpk_mobile[m has joined #ste||ar
ms[m] has joined #ste||ar
rori has joined #ste||ar
gdaiss[m] has joined #ste||ar
freifrau_von_ble has joined #ste||ar
heller1 has joined #ste||ar
gonidelis has joined #ste||ar
nikunj97 has joined #ste||ar
Nikunj__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 260 seconds]
hkaiser has joined #ste||ar
<hkaiser> ms[m]: yt?
<ms[m]> hkaiser: here
<hkaiser> hey g'morning
<hkaiser> ms[m]: I wanted to talk about sequencing the merges
<hkaiser> what's you planning?
<ms[m]> morning
<ms[m]> no plan, but agree that planning it would be a good idea
<ms[m]> if nothing else oldest first...
<hkaiser> should we go ahead with the cmake formatting? if yes we should either wait until all planned modules are in or do it asap as each module creates conflicts there
<ms[m]> I was thinking that it might be easier to wait with that one until it's quieter since it's easy enough to reapply to the cmake formatting, but if we merge it right away it'll be quite painless as well
<hkaiser> either way is fine for me, just would like to avoid having to resolve conflicts each time something is merged
<ms[m]> right now there aren't any massive cmake changes in other prs
<ms[m]> yeah, understand completely
<hkaiser> ok, I'll wait then - pls let me know when you think it's a good time
<ms[m]> let's go and merge the cmake formatting then because that is always going to have conflicts
<ms[m]> :P
<hkaiser> ok, I need to resolve conflicts first, then ;-)
<ms[m]> I don't mind, let's do it now if it's conflict free
<ms[m]> ok, I won't merge anything before that one is in then
<hkaiser> ok, thanks - will work on it today
<hkaiser> other thing
<hkaiser> I noticed you have manually edited libs/CMakeLists.txt
<hkaiser> isn't that one generated by the module creation script?
<ms[m]> mmh, true
<ms[m]> yes
<ms[m]> I figured we're at a point where we don't necessarily need to generate that, but then I should delete the script
<hkaiser> hmm
<ms[m]> all it does now is add a module in the correct place in a list
<hkaiser> and it gnerates all the boilerplate files
<ms[m]> and usually we forget to edit the script if we edit the actual cmakelists.txt
<ms[m]> ah, I exaggerated, delete the part that generates libs/CMakeLists.txt
<ms[m]> the rest is very useful
<hkaiser> right
<ms[m]> it does mean one has to remember to add the module, but that should be caught pretty easily
<ms[m]> or we add a check for that as well
<hkaiser> mdiers[m]: we could externalize the list of modules from the main CMakeLists file
<ms[m]> hmm?
<hkaiser> or we mark the modules that are meant to be distributed explicitly so we can collect that information
<hkaiser> we will end up with several different configurations anyways - I think automating that might be a good idea
<hkaiser> sorry mdiers[m], wrong auto-completion
<ms[m]> true, I can add the checks inside the modules instead
<ms[m]> we have the logic in place to exclude modules already
<hkaiser> right
<ms[m]> good idea, I'll do that
<hkaiser> ms[m]: ok - but after the cmake formatting was merged ;-)
<ms[m]> for the different configurations we'll need something more, but let's decide on that when it's relevant...
<ms[m]> yes ;)
<hkaiser> ms[m]: ok
<hkaiser> thanks
<ms[m]> thank you!
Yorlik has quit [Read error: Connection reset by peer]
Yorlik has joined #ste||ar
<hkaiser> ms[m]: I have pushed the cmake-format with conflicts resolved
<ms[m]> hkaiser: thanks! and no worries about conflicting with the other pr based on it, I was kind of expecting it ;)
<hkaiser> ms[m]: thanks
akheir has joined #ste||ar
bita__ has joined #ste||ar
nikunj has quit [Read error: Connection reset by peer]
kordejong has left #ste||ar ["Kicked by @appservice-irc:matrix.org : Idle for 30+ days"]
nikunj has joined #ste||ar
nikunj97 has joined #ste||ar
Guest52957 has left #ste||ar ["Kicked by @appservice-irc:matrix.org : Idle for 30+ days"]
Nikunj__ has quit [Ping timeout: 246 seconds]
Nikunj__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 256 seconds]
karame_ has joined #ste||ar
<gonidelis> Why should there be a my_hpx_build directory under the main hpx directory? Could someone help me clarify the differences/usage?
<ms[m]> btw, merge whenever it's clean, I suspect it's going to be later tonight...
weilewei has joined #ste||ar
<ms[m]> gonidelis: there's no need for it to be under your main hpx directory, it's just a convention
<ms[m]> the only requirement is that the build directory isn't the same as your source directory
<ms[m]> makes it easier to wipe a build without wiping all the source files as well
<gonidelis> Alright and what is the difference in terms of purpose? I mean, are the changes be made in build or src directory?
<hkaiser> ms[m]: grrr
<ms[m]> gonidelis: you make changes to the source directory
<ms[m]> builds are derived from the source
<ms[m]> I feel like we had this discussion once earlier... :) maybe it was with someone else
nikunj97 has joined #ste||ar
Nikunj__ has quit [Ping timeout: 240 seconds]
rtohid has joined #ste||ar
pfluegdk[m] has left #ste||ar ["Kicked by @appservice-irc:matrix.org : Idle for 30+ days"]
gonidelis has quit [Ping timeout: 245 seconds]
gonidelis has joined #ste||ar
<gonidelis> ms[m] thnx! ahh now I get it, so the main purpose of build_dir is to be able to make fresh install over and over again without having to download the source all the time?
<hkaiser> gonidelis: right
<gonidelis> perfect... thanks a lot!
<hkaiser> gonidelis: only the generated files (binaries) end up in the build dir, the sources (cpp/hpp) stay n the original directory
<gonidelis> yeah yeah... crystal clear
<weilewei> Does hpx have MPI_pack() similar functionality? I am looking a way to pack multiple arrays (from multiple threads) into single buffer and send it to next rank
<hkaiser> weilewei: use HPX serialization ;-)
<weilewei> hkaiser hpx component?
<hkaiser> bita__: I might be a couple minutes late today (again)
<bita__> no worries
<hkaiser> weilewei: was merely kidding ;-)
<weilewei> hkaiser ok...
<hkaiser> weilewei: how large are the arrays you're trying to combine?
<hkaiser> MPI_pack and friends will copy the data, I don't think that's what you want
<weilewei> each array might be 30-100 Mb at this point and each rank might have 7 of them
<weilewei> I am not sure if MPI_pack is thread-aware? For example, inside each thread, all call MPI_pack
<hkaiser> weilewei: I'd rather send those large arrays separately, the data copying involved in combining them would kill you
<weilewei> hkaiser ok... I am thinking it as well, the DCA++ mathematician is suggesting "It may depend on the quality of implementation whether the MPI library is internally just copying or packing, or can actually use the network hardware to transfer non-contiguous memory regions." If the situation is the latter one, it might be an ideal case
<hkaiser> weilewei: mpi_pack will copy things, I'm almost certain - but who knows
<hkaiser> look at the mpi_pack api, there is no way it can't get away without copying
<weilewei> hkaiser ok, then that's very bad
<weilewei> hkaiser well, it does ask for outbuff pointer
<hkaiser> I meant there is no way it _can_ get away with not copying
<weilewei> hkaiser see PM, please
<hkaiser> bita__: would 1.15 would be still ok for you?
<bita__> sure
<hkaiser> thanks
<hkaiser> \o/
Nikunj__ has joined #ste||ar
nikunj97 has quit [Ping timeout: 246 seconds]
gonidelis has quit [Ping timeout: 245 seconds]
nikunj97 has joined #ste||ar
Nikunj__ has quit [Ping timeout: 240 seconds]
Nikunj__ has joined #ste||ar
shahrzad has joined #ste||ar
nikunj97 has quit [Ping timeout: 256 seconds]
shahrzad has quit [Ping timeout: 252 seconds]
nikunj97 has joined #ste||ar
Nikunj__ has quit [Ping timeout: 244 seconds]
<jbjnr> weilewei: I'd second hkaiser - do not use mpi_pack for large data. if the number of arrays is known, then just send them one at a time. Ideally, an RMA copy API like the one I'm working on for another project would be best, but MPI doesn't make that easy.
<weilewei> jbjnr got it, it seems we might end up allocate a large array, G2_mem of size N*N*num_G2 where N is col/row of G2, and then send G2_mem out
<ms[m]> hkaiser: woop
<ms[m]> merge it! before ci finds other problems ;)
<hkaiser> ms[m]: done ;-)
<ms[m]> thanks!
<hkaiser> ms[m]: how do I link a shared library with hpx_init/hpx_wrap nowadays?
<ms[m]> hkaiser: bleh, you don't... I might need to rethink this
<ms[m]> do you have main or hpx_main in the shared library?
akheir has quit [Remote host closed the connection]
<hkaiser> nope
<hkaiser> I'm calling hpx::start
<ms[m]> then just link to HPX::hpx? does it not work?
<hkaiser> ms[m]: let me check, I might not need to link with hpx_init
<ms[m]> if it doesn't open an issue and I'll have a look tomorrow
<hkaiser> thanks!
<ms[m]> I'm not 100% happy with the targets yet...
<hkaiser> it complains about hpx::detail::init_winsocket being undefined
<hkaiser> I guess we could move that into core HPX
<ms[m]> hmm...
<ms[m]> maybe we can actually link hpx_init to everything
<ms[m]> it's probably what we did before
<ms[m]> it's just hpx_wrap that's special
<hkaiser> before we only linked executables, I think
<ms[m]> I thought so too... but then you at least had the option of manually linking to hpx_init
<ms[m]> (just thinking if this is a regression or not)
<hkaiser> ms[m]: I had cases in the past where I had main() in a shared library
<ms[m]> in principle you still have the option, I just hid it in a namespace because I thought one wouldn't need to link it to shared libraries
<hkaiser> could be different now
<hkaiser> well, let's cross that bridge when we're there
<ms[m]> do you think that might've been with `hpx_main.hpp` (i.e. the macro trickery)?
<hkaiser> we would have to move the winsocket initialization into the core library, though
<ms[m]> you seem to be crossing the bridge now ;)
<hkaiser> ms[m]: yah, could have been hpx_main
<hkaiser> Phylanx doesn't really have main() in a shared library
<hkaiser> Phylanx loads a Python extension module that initializes HPX
<hkaiser> main() is in the Python interpreter
<hkaiser> so it might not need hpx_init after all
<ms[m]> I mean you can try linking the shared library to HPXInternal::hpx_init (I think that's what it's called) just to see if that actually works
<ms[m]> hmm, right
<ms[m]> something expects init_winsocket to be there though...
<hkaiser> hpx::start
<hkaiser> which is in core anyways
<hkaiser> hpx_init has only the various main() overloads
<ms[m]> yeah
<ms[m]> and you're supplying the entrypoint manually?
<hkaiser> ms[m]: I think we can safely move that into core (and I can certainly do that)
<hkaiser> it's a windows hack after all anyways
<ms[m]> ok, let's start with that then
<ms[m]> thanks :)
<hkaiser> thank you!
<ms[m]> right, I'm off to bed... let me know tomorrow if it actually worked :)
<hkaiser> ok, thanks
<weilewei> shall I write my own vector_matrix class? In DCA, it has own reshapable_matrix and vector(technically an array), but now I am looking for a container that can have a series of reshapable matrix
rtohid has left #ste||ar [#ste||ar]
<hkaiser> weilewei: std::vector<reshapable_matrix>?
nikunj97 has quit [Read error: Connection reset by peer]