hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
jaafar has quit [Ping timeout: 240 seconds]
K-ballo has quit [Quit: K-ballo]
jaafar has joined #ste||ar
aserio has joined #ste||ar
hkaiser has quit [Quit: bye]
jaafar has quit [Ping timeout: 255 seconds]
aserio has quit [Quit: aserio]
nikunj has quit [Ping timeout: 245 seconds]
nikunj has joined #ste||ar
nikunj has quit [Read error: Connection reset by peer]
nikunj97 has joined #ste||ar
david_pfander has joined #ste||ar
<Yorlik> What happens if you make Singletons Components and try to move them to a place where there already is a Highlander ?
<Yorlik> Just curious - not that I have planned. Though there might be an application if you want to migrate an entire locality to shut down / reboot.
<Yorlik> Couzld you even create a singleton hpx component at all with the private constructor? Probably that answers the question.
<Yorlik> And what about statics? are they automagically cluster-wide and synchronized?
<zao> Regular language level statics?
<Yorlik> class statics
<Yorlik> And then you move
<Yorlik> err migrate
<Yorlik> Or change at locality A and query at locality B
<Yorlik> And yes - regular language statics too ofc
jaafar has joined #ste||ar
<zao> HPX cannot influence how language features work, unfortunately.
<Yorlik> So - you could have 2 classes with different statics and migrate?
<Yorlik> My guess is, that after migration the recreation of the object wouzld just adopt the local class statics
<Yorlik> But - that's an un-educated guess ... ;)
jaafar has quit [Quit: Konversation terminated!]
jaafar has joined #ste||ar
nikunj97 has quit [Quit: Leaving]
jbjnr__ has joined #ste||ar
jbjnr_ has quit [Ping timeout: 268 seconds]
K-ballo has joined #ste||ar
<heller_> Yorlik: it all depends how write your serialization functions and move/copy constructors
<zao> heller_: What on earth should I do with these migrate test failures? Wait for master to be compilable and see if it still manifests? Try some older builds and see if it's been around forever?
<heller_> I personally would refrain from making classic singletons. In a way, components are global objects already. This becomes evident when using symbolic names
<heller_> zao: it's been around for quite a while
<zao> Got a metric duckton of ctest logs from failed runs, should look into those some day and categorize the faults.
<zao> heller_: Yeah, I had a vague feeling it's not new.
<heller_> Ok, I'll be on a computer tonight
<heller_> What you could do however, is trying my sanitizers branch
<zao> I'm in no hurry, just wanted to check in a bit.
<zao> Ooh.
<heller_> And see if that changes anything
<heller_> I'll prepare prs for the different commits tonight
<heller_> And fix the mpi tester...
<zao> Speaking of MPI, we found a memory leak in UCX, some underlying component of OpenMPI.
<zao> Lots of researchers having code that previously ran fine be OOM-killed on compute nodes as OpenMPI leaked typecache :D
<heller_> Hihi
<heller_> Isn't ucx this libfabric style messaging middleware?
<heller_> Just depending on the fabric in use
<zao> Heaven knows, but considering the reporter is @mellanox, sounds right.
<zao> *commiter
<heller_> Yup
<heller_> That's the one
david_pfander has quit [Remote host closed the connection]
hkaiser has joined #ste||ar
Yorlik has quit [Read error: Connection reset by peer]
aserio has joined #ste||ar
bibek has joined #ste||ar
hkaiser has quit [Quit: bye]
eschnett has joined #ste||ar
david_pfander has joined #ste||ar
aserio has quit [Ping timeout: 252 seconds]
bita has joined #ste||ar
eschnett has quit [Quit: eschnett]
aserio has joined #ste||ar
eschnett has joined #ste||ar
eschnett has quit [Client Quit]
eschnett has joined #ste||ar
eschnett has quit [Quit: eschnett]
eschnett has joined #ste||ar
eschnett has quit [Quit: eschnett]
hkaiser has joined #ste||ar
eschnett has joined #ste||ar
jaafar has quit [Quit: Konversation terminated!]
aserio has quit [Ping timeout: 252 seconds]
jaafar has joined #ste||ar
jaafar_ has joined #ste||ar
jaafar has quit [Ping timeout: 252 seconds]
eschnett has quit [Quit: eschnett]
aserio has joined #ste||ar
eschnett has joined #ste||ar
Yorlik has joined #ste||ar
diehlpk_work has joined #ste||ar
aserio has quit [Quit: aserio]
aserio has joined #ste||ar
aserio has quit [Client Quit]
jaafar_ has quit [Quit: Konversation terminated!]
hkaiser has quit [Quit: bye]
jaafar has joined #ste||ar
bibek has quit [Quit: Konversation terminated!]
hkaiser has joined #ste||ar
<Yorlik> If I have an ordered set (std::set<uint64_t>) (*myset.end ( ) - *myset.begin ( )) should give me the difference between the largest and the smallest element, or am I wrong here?
<hkaiser> Yorlik: you should never dereference the iterator returned by end()
<Yorlik> Argh -- lol
<Yorlik> WTF
* Yorlik bangs head on table
<Yorlik> Ohg man - this is funny
<Yorlik> I knew it but didn't see it ...
bibek has joined #ste||ar
<hkaiser> Yorlik: btw, if myset.empty(), then you shouldn't dereference the iterator return from begin() either
<Yorlik> Sure
<Yorlik> I am just checking out ordered sets as a container for free slots in a pool
<Yorlik> Because I want to reuse the bottom most elements
<Yorlik> To keep my loop tight
<Yorlik> Playing with ways to create my custom allocator
<Yorlik> set looks like a nice way to keep the free list
<Yorlik> Even with 100k entries access and writes are below 1us
<hkaiser> Yorlik: a heap might be more appropriate, I'd suggest to do measurements
<Yorlik> The main problem is, I want to keep my (typed) elements in a straight line
<hkaiser> sure
<Yorlik> So - when loopingf I might have to occasionally skip
<hkaiser> Yorlik: I was referring to make_heap and friends (https://en.cppreference.com/w/cpp/algorithm/make_heap)
<Yorlik> and on a long running process I might move elementsa from time to time
<Yorlik> I really want to be able to loop over an array like arrtangement of my elements
<Yorlik> But I might just ditch that for a while and do it as a later optimization
<Yorlik> That idea has more issues and is mutch more problematic than I thought
<Yorlik> But the ideal would be to have a self sorting pool
<Yorlik> But then I'm ruinning into issues with rearranging elements
<hkaiser> Yorlik: it's called a 'heap' ;-)
<Yorlik> how would you keep elements packed in a rtow?
<hkaiser> the idea to have a separate datastructure holding the (sorted) list of indices of free elements is a good one
<hkaiser> use the heap instead of your set to maintain that list
<Yorlik> my issue is to have the update loop cache friendly
<Yorlik> IC
<hkaiser> a set causes you to do pointer chasing to find the next entry, a heap is build in cosecutive memory
<Yorlik> That makes sense
<Yorlik> Oh - I would not use the set to loop
<Yorlik> My idea is darker
<Yorlik> keep an index to the highest element in the pool
<Yorlik> and iterate over the pool from o to maxelement
<Yorlik> the empty slots wouzld get skipped
<Yorlik> There is an issue of having lots of deletions and a situation with a lot of element skipping
<Yorlik> The set would be just used for the allocations and free operations
<Yorlik> it would always give back the lowest element
<Yorlik> since it hold the addresses of the lowest free slot
<hkaiser> Yorlik: sure
<Yorlik> the problem could come up, if I have many elements, like an allocation burst and then a lot of deletions and elements hanging at the top of the array forcing the loop to run there
<Yorlik> But the real world behavior would have to be measured ofc
<Yorlik> But the object count fluctuates
<Yorlik> sometims a lot
<Yorlik> Reordering could be a later optimization
<Yorlik> HPX could actually help me here by using migration
<Yorlik> it would be horribly slow and only be used in desperate situations
<Yorlik> I'm just thinking ahead here to avoid certain nasty surprises I can anticipate now.
<heller_> i still think iterating over anything id_type in your tight loops will hurt you badly
<Yorlik> That's not the plan
<Yorlik> The plan is to store everything in an array
<Yorlik> That's why I'm so interested in a custom allocator for HPX Components
<Yorlik> I want to be able to use placement new semantics for them
<Yorlik> All that fuzz is about packing and direct access by having typped array like collections of objects
<Yorlik> For me the learning task now is to makea thread safe allocator which gives me that. It's a specialized application and I didn't find anything like that o the net.
<Yorlik> If the Objects are components I need no longer do the bookkeeping for their IDs, since I can use get:ptr to find them
<Yorlik> get_ptr
<Yorlik> but for looping I'd just zip over the storage directly
<Yorlik> My entities are mostly pods and will rarely use pointers to other objects
<Yorlik> containers are a special problem for example
<heller_> ok, as said, good luck with non intrusive migration then
<heller_> still have to review that PR
<Yorlik> That would be great - but it has time - I'm still on a huge learning task
<heller_> ok
<heller_> sorry about the delay
<Yorlik> Mamory management howtos, concurrency , ... hardcore stuff for me.
<heller_> ;)
<Yorlik> Its fun actually - especiually the atomic specials acquire - release semantics.
<Yorlik> Saw a 2 great talks by Herb Sutter explaining it very well.
<heller_> look for talks from tony van eerd on that topic
<Yorlik> I'll do
<Yorlik> Thnaks !
K-ballo has quit [Ping timeout: 240 seconds]
K-ballo1 has joined #ste||ar
<heller_> the problem is: you think you got it now, then try to code such a data structure and fail miserably
K-ballo1 is now known as K-ballo
<Yorlik> I think it come down to that proverb: "Sharing is the source of all contention"
<Yorlik> These lock free techniques are optimizations - but you can't get rid of the fundamental problem if you doin't design for it
<heller_> oh, performance is only secondary
<Yorlik> Reducing shared stuff as much as possible is the first thing to do, I believe.
<heller_> actually making them work correctly is the hardest part
<Yorlik> memory expolosions? :)
<Yorlik> UD all over the place?
<heller_> yes, UB in the form of data races
<Yorlik> I guess I'm going to see my share of pink elephants ...
<heller_> you will
<heller_> best is, to try to avoid to write your own concurrent data structures for now
<heller_> you have to unlearn what you learned in kindergarten: caring is not sharing
<Yorlik> I'll probably just cobble existing stuff together, indeed.
<Yorlik> I'll need a concurrent set, concurrent vector and concurrent map - however - I need to not just use stuff - integrating threadsafe stuff into my use case still can give me races all over the place.
<Yorlik> I really have to do a ton of learning anyways
<heller_> just something upfront: lockfree/waitfree is not about performance!
<heller_> you often get away with something using plain old mutices
<Yorlik> I mean correctness comes first, right?
<Yorlik> And then ... ?
<heller_> well, correctness is a precondition
<Yorlik> What is it about then from your view?
<heller_> execution constraints. They mostly come out of real time systems, where you need to have an upper bound
<Yorlik> Raw throughput is not all, yes.
<heller_> depending on the actual use case, the performance can be way better than the traditional mutex approach, of course
<heller_> so, a wait free algorithm is essential in safety critical real time systems. Most get away with lock free
<Yorlik> My plan is to get together a first implementation which has internal interfaces in a way, that I can swap out elements easily later if I need optimizations
<heller_> sure
<Yorlik> We won't be able to have wait free
<Yorlik> Too much contention sooner or later
<Yorlik> contention in an MMO is really changing a lot
<heller_> fun fact: if you read papers on lock free and wait free algorithms, they are mostly using plain math to proof the properties
<heller_> you have a soft real time system
<Yorlik> I'm not sure you can make statements about wait free without contention metrics
<heller_> but I'd guess that you read concurrently way more often than you write
<Yorlik> yep
<heller_> so the first step here is a reader/writer (shared) mutex
<heller_> with which you can get very far, I think
<Yorlik> Would that work with non-concurrent queues?