hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu
Yorlik_ has joined #ste||ar
Yorlik__ has quit [Ping timeout: 240 seconds]
HHN93 has joined #ste||ar
hkaiser has quit [Quit: Bye!]
HHN93 has quit [Quit: Client closed]
HumanGeek has quit [*.net *.split]
HumanGeek has joined #ste||ar
HumanGeek has quit [Max SendQ exceeded]
HumanGeek has joined #ste||ar
tufei__ has joined #ste||ar
tufei_ has quit [Remote host closed the connection]
gonidelis[m] has quit [Server closed connection]
gonidelis[m] has joined #ste||ar
hkaiser has joined #ste||ar
hkaiser has quit [Quit: Bye!]
hkaiser has joined #ste||ar
tufei_ has joined #ste||ar
tufei__ has quit [Ping timeout: 240 seconds]
tufei_ has quit [Remote host closed the connection]
tufei_ has joined #ste||ar
hkaiser has quit [Quit: Bye!]
hkaiser has joined #ste||ar
diehlpk_work has joined #ste||ar
<gonidelis[m]> what does sizeof(std::mt19937) supposed to mean/retun?
<gnikunj[m]> size of the class std::mersenne_twister_engine<std::uint_fast32_t, 32, 624, 397, 31,... (full message at <https://libera.ems.host/_matrix/media/v3/download/libera.chat/afc6b6607ae9195a554263a2cffbe759e59a4b7f>)
<K-ballo> what an odd question
<K-ballo> why std::mt19937 specifically?
<gonidelis[m]> It’s just a case in point
<gonidelis[m]> I am trying to figure out how to figure out the state of the Std random engines
<gonidelis[m]> State size ****
<gonidelis[m]> Idc about the state per se
<K-ballo> so the question is what does sizeof(T) mean?
<gonidelis[m]> Well for both llvm and gcc implantations that is
<gonidelis[m]> <gnikunj[m]> "size of the class std::mersenne_..." <- > <@gnikunj:matrix.org> size of the class std::mersenne_twister_engine<std::uint_fast32_t, 32, 624, 397, 31,... (full message at <https://libera.ems.host/_matrix/media/v3/download/libera.chat/ead7561f399f8dedfbf1e632901df68e19e0d8e3>)
<K-ballo> are you asking what the size of the common mersene twister is, rather than what sizeof means?
<gonidelis[m]> Final goal is to figure where c rand_r is faster/better
<gonidelis[m]> K-ballo: The former
<gonidelis[m]> gonidelis[m]: Whether *
<K-ballo> uh
<gonidelis[m]> WHY
<gnikunj[m]> That's a skewed benchmark. See: https://quick-bench.com/q/1or7LYQtbBIwnJWFPMN8dginzbA
<gonidelis[m]> it was the move, right?
<gonidelis[m]> was copying the whole generator
<gonidelis[m]> still rand_r 's better :E
<gnikunj[m]> Yeah, std::move is essentially a cast to T&&. When you don't have a function argument of type T&& then it creates a new object from T&& (move constructor). That and you're running srand(42) multiple times when you only need it once per execution.
<gnikunj[m]> rand_r seems non-standard when I google it up
<gonidelis[m]> right, i didn't know if seeding would be visible if srand was outside of rand()'s scope
<gonidelis[m]> yeah figures
<gnikunj[m]> I didn't know rand() was this slow!
<gnikunj[m]> I would stop using it lol
<gonidelis[m]> rand and rand_r are posix
<gonidelis[m]> anyways
<gnikunj[m]> yup, looked up linux manual and can see them all (rand, rand_r and srand)
<gnikunj[m]> Compiler does optimize away by inlining. Yet rand is actually slower. Wow.
<gnikunj[m]> Further inlining does helps std::mt19937 but doesn't change anything for the other 2: https://quick-bench.com/q/_wR1VLIu_FoiiohnqXKjLAouHB8
<gnikunj[m]> hkaiser: I remember us changing from std::mt19937 to rand 4-5 years ago. Why was that?
<gonidelis[m]> what 0.0
<gnikunj[m]> Ok no, it was the other way round. Alas: https://github.com/STEllAR-GROUP/hpx/pull/3204
<hkaiser> no we changed from rand to mt19937
<hkaiser> rand is not thread-safe
<gnikunj[m]> rand_r is both thread safe and faster
<gnikunj[m]> surprising
<hkaiser> but not portable
<gnikunj[m]> Any clue why rand_r is performing so much better here?
<hkaiser> no idea
<hkaiser> different randomness implementation, perhaps
<gnikunj[m]> I see. Checking the assembly output, everything is inlined and as fast as it can get (potentially) w.r.t benchmark we wrote.
<gnikunj[m]> Looks like the underlying implementation should be similar (if not same). The difference being seed set by srand vs seed sent as a pointer.
<gonidelis[m]> presumably it has smaller state vector...
<gonidelis[m]> i mean compared to mt
<gnikunj[m]> state vector?
<gonidelis[m]> the vector that prng's use to keep their state and produce the next random number
<gnikunj[m]> hkaiser: I found this really cool memory pool (singletong pool) implementation in Boost. Is there any literature I can read up on how someone came with that implementation?
<gonidelis[m]> but regarding rand and rand_r as i told u is that rand_r is thread safe while rand isnt
<gnikunj[m]> gonidelis: I haven't read any of the implementations on random number generators so can't really comment on this.
<gonidelis[m]> it's math land
<gnikunj[m]> gonidelis[m]: thread safety generally comes at an additional cost. It is surprising to see rand_r performing this much better. But again, I don't know anything about their implementation so I can't really comment either.
<hkaiser> gnikunj[m]: thread-safety can be achieved by moving the state into a thread-local variable, which is a very cheap thing to do (performance wise)
<hkaiser> that could even reduce the contention on the generator state significantly
<gnikunj[m]> Aah
<hkaiser> gnikunj[m]: the other things is that the compiler might not generate rand as an intrinsic, but does it for rand_r - but that's just speculation
<gnikunj[m]> hkaiser: makes sense. If there's a direct intrinsic for rand_r then it will be faster. Also, do you know any literature on boost memory pool implementation (as in how they came up with that?)?
<hkaiser> gnikunj[m]: it's an old idea
<gnikunj[m]> It's an interesting one nonetheless
<hkaiser> allocate a memory block and divide it into equalliy sized pieces large enough to hold a single object instance
<hkaiser> then link those blocks into a chain for fast access to the next free block
<gnikunj[m]> Yeah, but then automatic re-sizing
<hkaiser> you resize by adding more blocks of memory
<gnikunj[m]> You're making it uninteresting by your description :P
<hkaiser> there isn't more to it, really
<hkaiser> nowadays, tcmalloc or similar do that internally
<gnikunj[m]> Right. tcmalloc code is too complex though ;_;
<hkaiser> they just don't use a memory block for each object size, but have a set of blocks for some predefined object sizes
<hkaiser> they use the one for a particular type that fits best
<gnikunj[m]> Right. And then they do a size matching and allocate accordingly. I got the main gist but the code is hard to read.
<hkaiser> yah, the boost pool implementations should be straightforward, it's ancient code
<gnikunj[m]> Right, it was easy to understand compared to tcmalloc/jemallo
<gnikunj[m]> s/jemallo/jemalloc/
zao has quit [Server closed connection]
zao has joined #ste||ar
tufei_ has quit [Remote host closed the connection]
tufei_ has joined #ste||ar
hkaiser has quit [Quit: Bye!]