#ste||ar on 2023-06-29 — irc logs at irclog.cct.lsu.edu

2021-08-06 22:55 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar-group.org | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | This channel is logged: irclog.cct.lsu.edu

01:32 Yorlik_ has joined #ste||ar

01:35 Yorlik__ has quit [Ping timeout: 240 seconds]

01:46 HHN93 has joined #ste||ar

02:08 hkaiser has quit [Quit: Bye!]

02:09 HHN93 has quit [Quit: Client closed]

02:40 HumanGeek has quit [*.net *.split]

02:46 HumanGeek has joined #ste||ar

02:46 HumanGeek has quit [Max SendQ exceeded]

02:52 HumanGeek has joined #ste||ar

04:29 tufei__ has joined #ste||ar

04:29 tufei_ has quit [Remote host closed the connection]

09:32 gonidelis[m] has quit [Server closed connection]

09:32 gonidelis[m] has joined #ste||ar

11:35 hkaiser has joined #ste||ar

13:21 hkaiser has quit [Quit: Bye!]

15:01 hkaiser has joined #ste||ar

15:06 tufei_ has joined #ste||ar

15:07 tufei__ has quit [Ping timeout: 240 seconds]

15:09 tufei_ has quit [Remote host closed the connection]

15:09 tufei_ has joined #ste||ar

17:06 hkaiser has quit [Quit: Bye!]

17:57 hkaiser has joined #ste||ar

18:21 diehlpk_work has joined #ste||ar

19:00 <gonidelis[m]> what does sizeof(std::mt19937) supposed to mean/retun?

19:01 <gnikunj[m]> size of the class std::mersenne_twister_engine<std::uint_fast32_t, 32, 624, 397, 31,... (full message at <https://libera.ems.host/_matrix/media/v3/download/libera.chat/afc6b6607ae9195a554263a2cffbe759e59a4b7f>)

19:02 <K-ballo> what an odd question

19:02 <K-ballo> why std::mt19937 specifically?

19:11 <gonidelis[m]> It’s just a case in point

19:11 <gonidelis[m]> I am trying to figure out how to figure out the state of the Std random engines

19:11 <gonidelis[m]> State size ****

19:12 <gonidelis[m]> Idc about the state per se

19:12 <K-ballo> so the question is what does sizeof(T) mean?

19:12 <gonidelis[m]> Well for both llvm and gcc implantations that is

19:13 <gonidelis[m]> <gnikunj[m]> "size of the class std::mersenne_..." <- > <@gnikunj:matrix.org> size of the class std::mersenne_twister_engine<std::uint_fast32_t, 32, 624, 397, 31,... (full message at <https://libera.ems.host/_matrix/media/v3/download/libera.chat/ead7561f399f8dedfbf1e632901df68e19e0d8e3>)

19:14 <K-ballo> are you asking what the size of the common mersene twister is, rather than what sizeof means?

19:14 <gonidelis[m]> Final goal is to figure where c rand_r is faster/better

19:15 <gonidelis[m]> K-ballo: The former

19:16 <gonidelis[m]> gonidelis[m]: Whether *

19:20 <K-ballo> uh

19:59 <gonidelis[m]> https://quick-bench.com/q/xgHHm4b4GfMYPV5tp8AM9DNRbog

19:59 <gonidelis[m]> WHY

20:05 <gnikunj[m]> That's a skewed benchmark. See: https://quick-bench.com/q/1or7LYQtbBIwnJWFPMN8dginzbA

20:06 <gonidelis[m]> it was the move, right?

20:06 <gonidelis[m]> was copying the whole generator

20:06 <gonidelis[m]> still rand_r 's better :E

20:07 <gnikunj[m]> Yeah, std::move is essentially a cast to T&&. When you don't have a function argument of type T&& then it creates a new object from T&& (move constructor). That and you're running srand(42) multiple times when you only need it once per execution.

20:08 <gnikunj[m]> rand_r seems non-standard when I google it up

20:08 <gonidelis[m]> right, i didn't know if seeding would be visible if srand was outside of rand()'s scope

20:08 <gonidelis[m]> yeah figures

20:09 <gnikunj[m]> I didn't know rand() was this slow!

20:09 <gnikunj[m]> I would stop using it lol

20:11 <gonidelis[m]> rand and rand_r are posix

20:11 <gonidelis[m]> anyways

20:12 <gnikunj[m]> yup, looked up linux manual and can see them all (rand, rand_r and srand)

20:16 <gnikunj[m]> https://godbolt.org/z/GovoxWWbn

20:16 <gnikunj[m]> Compiler does optimize away by inlining. Yet rand is actually slower. Wow.

20:17 <gnikunj[m]> Further inlining does helps std::mt19937 but doesn't change anything for the other 2: https://quick-bench.com/q/_wR1VLIu_FoiiohnqXKjLAouHB8

20:18 <gnikunj[m]> hkaiser: I remember us changing from std::mt19937 to rand 4-5 years ago. Why was that?

20:18 <gonidelis[m]> what 0.0

20:23 <gnikunj[m]> Ok no, it was the other way round. Alas: https://github.com/STEllAR-GROUP/hpx/pull/3204

20:23 <hkaiser> no we changed from rand to mt19937

20:24 <hkaiser> rand is not thread-safe

20:24 <gnikunj[m]> rand_r is both thread safe and faster

20:24 <gnikunj[m]> surprising

20:24 <hkaiser> but not portable

20:24 <gnikunj[m]> Any clue why rand_r is performing so much better here?

20:24 <hkaiser> no idea

20:24 <hkaiser> different randomness implementation, perhaps

20:25 <gnikunj[m]> I see. Checking the assembly output, everything is inlined and as fast as it can get (potentially) w.r.t benchmark we wrote.

20:26 <gnikunj[m]> https://linux.die.net/man/3/rand_r

20:26 <gnikunj[m]> Looks like the underlying implementation should be similar (if not same). The difference being seed set by srand vs seed sent as a pointer.

20:33 <gonidelis[m]> presumably it has smaller state vector...

20:33 <gonidelis[m]> i mean compared to mt

20:33 <gnikunj[m]> state vector?

20:34 <gonidelis[m]> the vector that prng's use to keep their state and produce the next random number

20:34 <gnikunj[m]> hkaiser: I found this really cool memory pool (singletong pool) implementation in Boost. Is there any literature I can read up on how someone came with that implementation?

20:34 <gonidelis[m]> but regarding rand and rand_r as i told u is that rand_r is thread safe while rand isnt

20:35 <gnikunj[m]> gonidelis: I haven't read any of the implementations on random number generators so can't really comment on this.

20:35 <gonidelis[m]> it's math land

20:36 <gnikunj[m]> gonidelis[m]: thread safety generally comes at an additional cost. It is surprising to see rand_r performing this much better. But again, I don't know anything about their implementation so I can't really comment either.

20:44 <hkaiser> gnikunj[m]: thread-safety can be achieved by moving the state into a thread-local variable, which is a very cheap thing to do (performance wise)

20:44 <hkaiser> that could even reduce the contention on the generator state significantly

20:46 <gnikunj[m]> Aah

20:56 <hkaiser> gnikunj[m]: the other things is that the compiler might not generate rand as an intrinsic, but does it for rand_r - but that's just speculation

21:00 <gnikunj[m]> hkaiser: makes sense. If there's a direct intrinsic for rand_r then it will be faster. Also, do you know any literature on boost memory pool implementation (as in how they came up with that?)?

21:01 <hkaiser> gnikunj[m]: it's an old idea

21:01 <gnikunj[m]> It's an interesting one nonetheless

21:02 <hkaiser> allocate a memory block and divide it into equalliy sized pieces large enough to hold a single object instance

21:02 <hkaiser> then link those blocks into a chain for fast access to the next free block

21:02 <gnikunj[m]> Yeah, but then automatic re-sizing

21:02 <hkaiser> you resize by adding more blocks of memory

21:03 <gnikunj[m]> You're making it uninteresting by your description :P

21:03 <hkaiser> there isn't more to it, really

21:04 <hkaiser> nowadays, tcmalloc or similar do that internally

21:04 <gnikunj[m]> Right. tcmalloc code is too complex though ;_;

21:05 <hkaiser> they just don't use a memory block for each object size, but have a set of blocks for some predefined object sizes

21:05 <hkaiser> they use the one for a particular type that fits best

21:05 <gnikunj[m]> Right. And then they do a size matching and allocate accordingly. I got the main gist but the code is hard to read.

21:07 <hkaiser> yah, the boost pool implementations should be straightforward, it's ancient code

21:07 <gnikunj[m]> Right, it was easy to understand compared to tcmalloc/jemallo

21:07 <gnikunj[m]> s/jemallo/jemalloc/

21:11 zao has quit [Server closed connection]

21:11 zao has joined #ste||ar

22:19 tufei_ has quit [Remote host closed the connection]

22:19 tufei_ has joined #ste||ar

23:04 hkaiser has quit [Quit: Bye!]