hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu
tufei__ has joined #ste||ar
tufei_ has quit [Ping timeout: 240 seconds]
K-ballo has quit [Quit: K-ballo]
hkaiser_ has quit [Quit: Bye!]
tufei_ has joined #ste||ar
tufei__ has quit [Ping timeout: 240 seconds]
hkaiser has joined #ste||ar
hkaiser has quit [Ping timeout: 246 seconds]
K-ballo has joined #ste||ar
hkaiser has joined #ste||ar
hkaiser has quit [Quit: Bye!]
<zao> Once again, I'm experimentative enough to consider using HPX for a problem that TBB doesn't quite cut it for.
hkaiser has joined #ste||ar
<zao> Essay of questions incoming...
<zao> I've got some somewhat parallel data processing that I'd like to run a bit faster than the naive approach gets me.
<zao> These serve as the source data for a decompression and splitting step that produces around 50 gigs of data across 800k virtual files - these are also deduplicated into another datastore. The final output of one cycle of the program is a pair on manifest files that map from original "loose" and "bundled" paths to their content hashes for future serving to clients through another application.
<zao> At its core it's reading in around 30 gigs of compressed bundles across 40k files on slow HDDs. Each of these "loose files" is hashed and ingested into an LMDB datastore on disk keyed on the content hash for deduplication.
<zao> I've got some trouble figuring out how I would rate limit the different processing steps to avoid exceeding the system memory as my working set is way larger than the node RAM (80 GiB vs. 16 GiB for the dev box, less in production). With TBB and their pipeline system I could leverage tokens to avoid having too much data in flight.
<zao> I need some OS thread affinity for the datastore as it leverages regular OS thread TLS which I believe I can achieve by making a separate small thread pool for specific tasks in a partitioner callback? How do I deal with costly I/O with some but low parallelism, another dedicated thread pool?
<zao> I'm looking at HPX dataflow for the overall program flow but can't quite figure out how I'd limit the amount of loose file data that's in flight, and similarly avoid intermediary read/write steps from bunching up too much on the slowest part of the system. Can I model this kind of credit/limit system in HPX somehow?
<hkaiser> zao: do you rely on external libraries to do the threading?
<zao> I have no such thing at the moment, only reached for TBB to see how well their things would work.
<zao> Library-wise I've got LMDB for data access, a task-safe custom decompression library and a task-safe library of mine for interpreting the index. Standard I/O-streams for the initial file reading, no memory mapping or other amortization there as I want the data resident rather than spreading the I/O cost everywhere.
<zao> LMDB explicitly warns about green threading and needs real threads in serial, at least for write transactions.
hkaiser_ has joined #ste||ar
<hkaiser_> zao: just asking
hkaiser has quit [Ping timeout: 252 seconds]
<zao> At some point in time I'd like to gather up multiple DB writes into larger write transactions as well, but that's more of wishful thinking.
<hkaiser_> so you need to define affinities for your threads, mainly
<hkaiser_> is that what you're looking for?
<zao> I think the affinity stuff is probably somewhat straightforward.
<hkaiser_> zao: you can use our counting semaphore to limit the amount of tasks scheduled, there is also some restricted executor doing similar things
<zao> Ooh, that's the kind of construct I had forgotten existed :D
<zao> This might be something to start prototyping on, thanks!
ms[m] has quit [Quit: You have been kicked for being idle]
tufei_ has quit [Remote host closed the connection]
tufei_ has joined #ste||ar
hkaiser_ has quit [Quit: Bye!]
hkaiser has joined #ste||ar
<zao> Bah, left my HPX 1.8.1 compile running overday on the VM and it failed out thanks to not having the fix for PR#6166 in the source tree. vcpkg doesn't package 1.9.0 yet for some reason.
<gonidelis[m]> auto&& and decltype(auto). are they equivalent? What’s a case where they give a different deduction result
<gonidelis[m]> Also hkaiser who’s doing the vcpkg for our releases ? 1.9.0 is not there yet and we plan on releasing 1.9.1 already
HHN93 has joined #ste||ar
<HHN93> Is there any reason tag_fallback_invoke is generally marked as a friend function?
HHN93 has quit [Quit: Client closed]
hkaiser has quit [Ping timeout: 246 seconds]
hkaiser has joined #ste||ar
hkaiser has quit [Ping timeout: 264 seconds]