#ste||ar on 2020-04-13 — irc logs at irclog.cct.lsu.edu

2020-02-24 20:46 hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020

00:00 <hkaiser> Yorlik: well, now use VTune or similar to look at the hotspots

00:00 <Yorlik> The 64ary also uses WAY more memory

00:00 <Yorlik> Even reducing the coords to 32 bit

00:00 <Yorlik> I can't handle 1e9 items anymore, not even 1E8

00:00 <Yorlik> The quadtree could

00:01 <hkaiser> well, how many tree nodes do you have?

00:01 <Yorlik> did the same runs as with the quadtree, up to 1e9 items

00:02 <Yorlik> err

00:02 <Yorlik> 1e7

00:02 <hkaiser> Yorlik: nodes or objects in the tree?

00:02 <Yorlik> objects

00:02 <hkaiser> how many nodes do you have

00:02 <hkaiser> how deep is the tree?

00:02 <Yorlik> I need to test more - might be I have been abusing windows for too long and need arestart

00:03 <Yorlik> I allowed up to 63 (max) and 1 item per leaf

00:03 <Yorlik> Though that makes no sense with 32 bit coords - maybe thats a bug

00:03 <hkaiser> why 1 item per leaf?

00:04 <Yorlik> For stress tesating it

00:04 <Yorlik> It relaxes a lot when allowing 32 for examples

00:04 <hkaiser> well, that's not a real test, then

00:04 <hkaiser> you waste 64 pointers on multiple node levels to store one item

00:04 <Yorlik> Yes

00:04 <Yorlik> I'll check more variations.

00:05 <hkaiser> then stop complaining ;-)

00:05 <Yorlik> 31 deptzh btw

00:05 <hkaiser> no need

00:05 <hkaiser> you have only 16 bit coords

00:06 <Yorlik> True

00:07 <Yorlik> So - with 32 items bucket size and a max depth of 15 (might override the max_bucket size):

00:07 <Yorlik> Loops: 66 Diff: 319 ns/loop ( 1000 items)

00:07 <Yorlik> Loops: 1060 Diff: 866 ns/loop ( 10000 items)

00:07 <Yorlik> Loops: 947092 Diff: 2938 ns/loop (10000000 items)

00:07 <Yorlik> Loops: 8960 Diff: 1924 ns/loop ( 100000 items)

00:07 <Yorlik> Loops: 86760 Diff: 2830 ns/loop ( 1000000 items)

00:08 <Yorlik> Max mem usage was ~3GB

00:08 <Yorlik> Not too bad.

00:08 <Yorlik> What's this vtine thing? I usually use the VS Profiler

00:08 <Yorlik> vtune

00:12 <hkaiser> nowadays it's called intel amplifier

00:12 <hkaiser> you should be able to download it and ask for a license, integrates well with VS

00:13 <hkaiser> I never used VS Profiler, might be good enough

00:16 <Yorlik> It says 95% IO - one copy assignment eats a lot. Time to move it.

01:02 <hkaiser> Yorlik: memory operations are almost always the culprit

01:02 <Yorlik> It's tricky to correct this

01:02 <Yorlik> Since I am operating with variants over shared_ptr

01:03 <Yorlik> Still working on the sausage :)

01:03 <Yorlik> BTW: looks like a legit application of a referencing raw pointer to me.

01:03 <Yorlik> its current/next operations

01:04 <hkaiser> well, if you move the variant, it will move the shared_ptr

01:04 <Yorlik> It was still expensive

01:04 <Yorlik> I did this

01:04 <hkaiser> k

01:04 <Yorlik> But maybe I read something wrong

01:05 <Yorlik> After all moving a shared_ptr should be very cheap

01:05 <Yorlik> Actually I wasn't moving the shared_ptr but the variant

01:05 <Yorlik> I'll figure it out

01:06 <Yorlik> After all I have a parent node type I can use for referencing

01:10 hkaiser has quit [Read error: Connection reset by peer]

01:12 hkaiser has joined #ste||ar

02:27 hkaiser has quit [Quit: bye]

03:16 Yorlik has quit [Read error: Connection reset by peer]

03:22 Yorlik has joined #ste||ar

03:52 weilewei has quit [Remote host closed the connection]

08:14 Hashmi has joined #ste||ar

08:55 gonidelis has joined #ste||ar

09:14 gonidelis has quit [Remote host closed the connection]

09:14 gonidelis has joined #ste||ar

09:16 nikunj97 has joined #ste||ar

09:16 <nikunj97> heller1, cpu does not support frequency scaling - Prof.

09:50 gonidelis has quit [Remote host closed the connection]

10:24 Hashmi has quit [Quit: Connection closed for inactivity]

11:42 weilewei has joined #ste||ar

11:50 <weilewei> simbergm John manages to get DCA dashboard back online, just FYI. https://cdash.cscs.ch/index.php?project=DCA. Thanks!

13:17 hkaiser has joined #ste||ar

13:54 <diehlpk_work> April 13 – May 4Open source organizations apply to take part in Season of Docs

13:54 <diehlpk_work> Should we apply again this year?

13:54 <diehlpk_work> If so who will help me in preparing the application

13:57 akheir has joined #ste||ar

13:58 <diehlpk_work> akheir, Could it be possible does slurm not send email yet?

13:58 <akheir> diehlpk_work: I will look into it today

13:59 <diehlpk_work> Ok, cool, will run some jobs this week

14:01 akheir has quit [Read error: Connection reset by peer]

14:01 akheir1 has joined #ste||ar

14:11 <weilewei> diehlpk_work do you run into any problems with running jobs on Summit?

14:16 <diehlpk_work> weilewei, Ha dno time to run them yet

14:17 <weilewei> diehlpk_work ok, just checking

14:17 <diehlpk_work> Anyways we will not run much on Summit right now. Only two jobs to test the code

14:18 <diehlpk_work> I think we will run more jobs at the end of this year

14:20 <weilewei> diehlpk_work I am not sure if Summit will be available on 2021 or not, because Frontier will be delivered on 2021. not sure if I remember correctly or not

14:21 <diehlpk_work> weilewei, Ok, anyways we have to port more computation to CUDA vefore we cna run on Summit

14:22 <diehlpk_work> Currently, the most expensive part runs on the CPU

14:22 <weilewei> diehlpk_work nice. just saying

14:23 <diehlpk_work> Sure

14:36 <akheir1> diehlpk_work: configured it. use --mail-type=ALL --mail-user=<your email> to submit jobs

14:36 <diehlpk_work> akheir1, Cool, thanks

14:36 <diehlpk_work> I will try it

14:37 <akheir1> diehlpk_work: you can use --mail-type=END to only get an email when the job is done.

14:37 <diehlpk_work> I also like to get the failed ones to check and resubmit

14:43 <nikunj97> hkaiser, yt?

14:49 <zao> As a general cluster sysadmin, we recommend not going overkill on job status mails, it's a great way to fill your mailbox and to get the HPC site blacklisted as "spam".

14:53 <hkaiser> nikunj97: here

14:53 <nikunj97> hkaiser, is there a proposal for STL to support __sizeless_struct?

14:53 <nikunj97> STL containers ^^

14:54 <hkaiser> what should it do to the STL?

14:55 <nikunj97> so I'm working with SVE simd and it's pack works with __sizeless_struct i.e. the size of the struct is determined at runtime. Using std::vector<nsimd::pack<float> > will lead to compiler error due to this

14:55 <nikunj97> it will complain: error: arithmetic on a pointer to an incomplete type

14:57 <hkaiser> not sure you can do anything about that, actually

14:57 <nikunj97> the concept of __sizeless_struct is currently only supported by ARM compilers. GCC on the other hand wants you to provide the vector length at compile time, which makes the whole code non portable :/

14:58 <nikunj97> and if you tell the vector length at compile time, it kind of shadows the use of SVE itself which is meant to be vector length agnostic

14:59 <nikunj97> hkaiser, is there nothing I can do about it?

14:59 <hkaiser> nikunj97: I wouldn't know how to have a vector<> for a type for which the size is unknown at compile-time

15:00 <nikunj97> ughh, let me write an email to ARM guys then

15:00 <nikunj97> they may have some useful info

15:03 <nikunj97> lol arm hpc compiler isn't open sourced or have a community where I could ask

15:04 <hkaiser> nikunj97: do they support such a vector<>?

15:05 <nikunj97> not really, but since they've developed __sizeless_struct, they may have some idea of how to make wrappers around STL containers to support __sizeless_struct

15:06 <hkaiser> nikunj97: well, actually - it might be possible - if you can detect the size of T at runtime, the vector<T> could be created, it internally is a pointer to T[] anyways

15:06 rtohid has joined #ste||ar

15:07 <nikunj97> hkaiser, how do you propose I should handle it?

15:07 karame_ has joined #ste||ar

15:08 <hkaiser> nikunj97: wrtie your own vector<T>?

15:09 <nikunj97> hkaiser, that part I understood ;)

15:09 <nikunj97> I meant how do I work with type that's deduced at runtime?

15:09 <hkaiser> nikunj97: you can take compute::vector as a starting point

15:11 <hkaiser> nikunj97: well, their sizeless type has to somehow expose its size at runtime

15:11 <hkaiser> it most likely has a size() member

15:12 <hkaiser> so on the vector allocation function, instead of using sizeof(T) you use T::size()

15:12 <nikunj97> aah makes sense!

15:12 <nikunj97> got what you're saying, this looks doable

15:13 <hkaiser> depending on whether scalar<T>::is_sized is true or false

15:48 Amy1 has quit [Quit: WeeChat 2.2]

15:48 Amy1 has joined #ste||ar

15:54 nan11 has joined #ste||ar

15:57 Amy1 has quit [Quit: WeeChat 2.2]

15:57 Amy1 has joined #ste||ar

17:20 <zao> Bleh. HPX doesn't spawn runtime threads on SMTs on my machine out of the box. Time to figure out how to configure the runtime.

17:25 <zao> I guess I'll just jam in hpx.os_threads on the command line somehow.

17:30 Hashmi has joined #ste||ar

17:33 nikunj97 has quit [Ping timeout: 260 seconds]

17:37 <zao> With great command line arguments comes great pessimisation... init time went from 86s to 134s when leveraging all threads :D

17:55 <diehlpk_work> hkaiser, see pm, please

18:43 <hkaiser> diehlpk_work: ok

18:46 <diehlpk_work> hkaiser, please reply to diehlpk_work since I use a different device

19:07 nan11 has quit [Remote host closed the connection]

19:18 bita has joined #ste||ar

19:20 nan11 has joined #ste||ar

19:40 Hashmi has quit [Quit: Connection closed for inactivity]

19:46 rtohid has quit [Remote host closed the connection]

20:03 weilewei has quit [Remote host closed the connection]

20:29 weilewei has joined #ste||ar

20:33 K-ballo has quit [Remote host closed the connection]

20:34 K-ballo has joined #ste||ar

20:36 hkaiser has quit [Ping timeout: 265 seconds]

20:38 bita_ has joined #ste||ar

20:39 bita has quit [Ping timeout: 260 seconds]

20:50 nan11 has quit [Remote host closed the connection]

20:57 nan11 has joined #ste||ar

20:58 rtohid has joined #ste||ar

21:02 <weilewei> I would like to have a reduce operation across ranks in the following manner, say, rank 0 allocates a vector of size 10 and computes values if index ranges from 0 to 4, and rank 1 allocates a vector of size 5 and computes values of index ranges from 5 to 9. And rank 1's vector will be reduced to rank 0's vector starting from index 5 to 9

21:02 <bita_> I have an hpx question: for some part of my test, when I am using the verbose flag, I see that my "base command" is <unknown>. Any ideas?

21:03 <weilewei> is there any mpi operation that is available? the rank numbers could be large, not just 2 ranks.

21:04 hkaiser has joined #ste||ar

21:06 <weilewei> maybe, shall I use MPI_Reduce, and then offset recvbuf pointer in non-root rank?

21:07 stmatengss has joined #ste||ar

21:07 weilewei has quit [Remote host closed the connection]

21:07 <bita_> hkaiser, I have an hpx question: for some part of my test, when I am using the verbose flag, I see that my "base command" is <unknown>. Any advice?

21:07 weilewei has joined #ste||ar

21:08 stmatengss has left #ste||ar [#ste||ar]

21:08 <hkaiser> bita_: not sure what you're referring to

21:09 <bita_> here,https://github.com/STEllAR-GROUP/phylanx/blob/32d65802173cf82c57858ce6a63d4d98e6cce421/tests/unit/plugins/dist_matrixops/retile_6_loc.cpp, it works when I have only test_retile_6loc_1d_0

21:10 <bita_> and using ctest -V, I see the base command is: 177: Base command is "/phylanx/build/bin/retile_6_loc_test --hpx:ini=hpx.parcel.tcp.enable=1 --hpx:threads=1 --hpx:localities=6"

21:11 <bita_> Adding test_retile_6loc_2d_0, the base command is becoming <unknown>

21:15 <hkaiser> hmmm

21:15 <hkaiser> where do you see that? circleci?

21:15 <bita_> no I build that on docker and msvc

21:16 <bita_> in MSVC one of the cores have and <unknown> in it

21:17 <hkaiser> bita_: how can I reproduce this?

21:17 <bita_> Of course on docker it times out after 200 seconds, and I have seen hangs on windows 1 out of 5 times

21:18 weilewei has quit [Remote host closed the connection]

21:18 <bita_> I think so, the branch is add_retiling

21:20 <hkaiser> bita_: how?

21:21 weilewei has joined #ste||ar

21:22 <bita_> On Phylanx's add_retiling branch, I use cmake --build /phylanx/build --target tests.unit.plugins.dist_matrixops.retile_6_loc, and then ctest -V -R tests.unit.plugins.dist_matrixops.distributed.tcp.retile_6_loc

21:22 <bita_> how can I make an issue, when it needs the changes in a branch?

21:23 <bita_> It is now running here on circleCI, https://circleci.com/workflow-run/c0261b1d-5910-4e88-9e2f-6e1b146d4857, maybe we can wait to see if we see the same error there

21:24 <hkaiser> ok

21:25 <hkaiser> should have the same error

21:27 <bita_> nod

22:15 nikunj97 has joined #ste||ar

23:04 <bita_> hkaiser, here it is: https://circleci.com/gh/STEllAR-GROUP/phylanx/35802 An unknown is printed

23:05 <hkaiser> bita_: ok, thanks

23:06 <hkaiser> no idea what's causing this

23:06 <hkaiser> moreover as the 2 and 3 locality caes pass ok

23:07 <bita_> got it, thanks

23:51 <nikunj97> hkaiser, yt?

23:54 rtohid has left #ste||ar [#ste||ar]