hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020
<hkaiser> Yorlik: well, now use VTune or similar to look at the hotspots
<Yorlik> The 64ary also uses WAY more memory
<Yorlik> Even reducing the coords to 32 bit
<Yorlik> I can't handle 1e9 items anymore, not even 1E8
<Yorlik> The quadtree could
<hkaiser> well, how many tree nodes do you have?
<Yorlik> did the same runs as with the quadtree, up to 1e9 items
<Yorlik> err
<Yorlik> 1e7
<hkaiser> Yorlik: nodes or objects in the tree?
<Yorlik> objects
<hkaiser> how many nodes do you have
<hkaiser> how deep is the tree?
<Yorlik> I need to test more - might be I have been abusing windows for too long and need arestart
<Yorlik> I allowed up to 63 (max) and 1 item per leaf
<Yorlik> Though that makes no sense with 32 bit coords - maybe thats a bug
<hkaiser> why 1 item per leaf?
<Yorlik> For stress tesating it
<Yorlik> It relaxes a lot when allowing 32 for examples
<hkaiser> well, that's not a real test, then
<hkaiser> you waste 64 pointers on multiple node levels to store one item
<Yorlik> Yes
<Yorlik> I'll check more variations.
<hkaiser> then stop complaining ;-)
<Yorlik> 31 deptzh btw
<hkaiser> no need
<hkaiser> you have only 16 bit coords
<Yorlik> True
<Yorlik> So - with 32 items bucket size and a max depth of 15 (might override the max_bucket size):
<Yorlik> Loops: 66 Diff: 319 ns/loop ( 1000 items)
<Yorlik> Loops: 1060 Diff: 866 ns/loop ( 10000 items)
<Yorlik> Loops: 947092 Diff: 2938 ns/loop (10000000 items)
<Yorlik> Loops: 8960 Diff: 1924 ns/loop ( 100000 items)
<Yorlik> Loops: 86760 Diff: 2830 ns/loop ( 1000000 items)
<Yorlik> Max mem usage was ~3GB
<Yorlik> Not too bad.
<Yorlik> What's this vtine thing? I usually use the VS Profiler
<Yorlik> vtune
<hkaiser> nowadays it's called intel amplifier
<hkaiser> you should be able to download it and ask for a license, integrates well with VS
<hkaiser> I never used VS Profiler, might be good enough
<Yorlik> It says 95% IO - one copy assignment eats a lot. Time to move it.
<hkaiser> Yorlik: memory operations are almost always the culprit
<Yorlik> It's tricky to correct this
<Yorlik> Since I am operating with variants over shared_ptr
<Yorlik> Still working on the sausage :)
<Yorlik> BTW: looks like a legit application of a referencing raw pointer to me.
<Yorlik> its current/next operations
<hkaiser> well, if you move the variant, it will move the shared_ptr
<Yorlik> It was still expensive
<Yorlik> I did this
<hkaiser> k
<Yorlik> But maybe I read something wrong
<Yorlik> After all moving a shared_ptr should be very cheap
<Yorlik> Actually I wasn't moving the shared_ptr but the variant
<Yorlik> I'll figure it out
<Yorlik> After all I have a parent node type I can use for referencing
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
hkaiser has quit [Quit: bye]
Yorlik has quit [Read error: Connection reset by peer]
Yorlik has joined #ste||ar
weilewei has quit [Remote host closed the connection]
Hashmi has joined #ste||ar
gonidelis has joined #ste||ar
gonidelis has quit [Remote host closed the connection]
gonidelis has joined #ste||ar
nikunj97 has joined #ste||ar
<nikunj97> heller1, cpu does not support frequency scaling - Prof.
gonidelis has quit [Remote host closed the connection]
Hashmi has quit [Quit: Connection closed for inactivity]
weilewei has joined #ste||ar
<weilewei> simbergm John manages to get DCA dashboard back online, just FYI. https://cdash.cscs.ch/index.php?project=DCA. Thanks!
hkaiser has joined #ste||ar
<diehlpk_work> April 13 – May 4Open source organizations apply to take part in Season of Docs
<diehlpk_work> Should we apply again this year?
<diehlpk_work> If so who will help me in preparing the application
akheir has joined #ste||ar
<diehlpk_work> akheir, Could it be possible does slurm not send email yet?
<akheir> diehlpk_work: I will look into it today
<diehlpk_work> Ok, cool, will run some jobs this week
akheir has quit [Read error: Connection reset by peer]
akheir1 has joined #ste||ar
<weilewei> diehlpk_work do you run into any problems with running jobs on Summit?
<diehlpk_work> weilewei, Ha dno time to run them yet
<weilewei> diehlpk_work ok, just checking
<diehlpk_work> Anyways we will not run much on Summit right now. Only two jobs to test the code
<diehlpk_work> I think we will run more jobs at the end of this year
<weilewei> diehlpk_work I am not sure if Summit will be available on 2021 or not, because Frontier will be delivered on 2021. not sure if I remember correctly or not
<diehlpk_work> weilewei, Ok, anyways we have to port more computation to CUDA vefore we cna run on Summit
<diehlpk_work> Currently, the most expensive part runs on the CPU
<weilewei> diehlpk_work nice. just saying
<diehlpk_work> Sure
<akheir1> diehlpk_work: configured it. use --mail-type=ALL --mail-user=<your email> to submit jobs
<diehlpk_work> akheir1, Cool, thanks
<diehlpk_work> I will try it
<akheir1> diehlpk_work: you can use --mail-type=END to only get an email when the job is done.
<diehlpk_work> I also like to get the failed ones to check and resubmit
<nikunj97> hkaiser, yt?
<zao> As a general cluster sysadmin, we recommend not going overkill on job status mails, it's a great way to fill your mailbox and to get the HPC site blacklisted as "spam".
<hkaiser> nikunj97: here
<nikunj97> hkaiser, is there a proposal for STL to support __sizeless_struct?
<nikunj97> STL containers ^^
<hkaiser> what should it do to the STL?
<nikunj97> so I'm working with SVE simd and it's pack works with __sizeless_struct i.e. the size of the struct is determined at runtime. Using std::vector<nsimd::pack<float> > will lead to compiler error due to this
<nikunj97> it will complain: error: arithmetic on a pointer to an incomplete type
<hkaiser> not sure you can do anything about that, actually
<nikunj97> the concept of __sizeless_struct is currently only supported by ARM compilers. GCC on the other hand wants you to provide the vector length at compile time, which makes the whole code non portable :/
<nikunj97> and if you tell the vector length at compile time, it kind of shadows the use of SVE itself which is meant to be vector length agnostic
<nikunj97> hkaiser, is there nothing I can do about it?
<hkaiser> nikunj97: I wouldn't know how to have a vector<> for a type for which the size is unknown at compile-time
<nikunj97> ughh, let me write an email to ARM guys then
<nikunj97> they may have some useful info
<nikunj97> lol arm hpc compiler isn't open sourced or have a community where I could ask
<hkaiser> nikunj97: do they support such a vector<>?
<nikunj97> not really, but since they've developed __sizeless_struct, they may have some idea of how to make wrappers around STL containers to support __sizeless_struct
<hkaiser> nikunj97: well, actually - it might be possible - if you can detect the size of T at runtime, the vector<T> could be created, it internally is a pointer to T[] anyways
rtohid has joined #ste||ar
<nikunj97> hkaiser, how do you propose I should handle it?
karame_ has joined #ste||ar
<hkaiser> nikunj97: wrtie your own vector<T>?
<nikunj97> hkaiser, that part I understood ;)
<nikunj97> I meant how do I work with type that's deduced at runtime?
<hkaiser> nikunj97: you can take compute::vector as a starting point
<hkaiser> nikunj97: well, their sizeless type has to somehow expose its size at runtime
<hkaiser> it most likely has a size() member
<hkaiser> so on the vector allocation function, instead of using sizeof(T) you use T::size()
<nikunj97> aah makes sense!
<nikunj97> got what you're saying, this looks doable
<hkaiser> depending on whether scalar<T>::is_sized is true or false
Amy1 has quit [Quit: WeeChat 2.2]
Amy1 has joined #ste||ar
nan11 has joined #ste||ar
Amy1 has quit [Quit: WeeChat 2.2]
Amy1 has joined #ste||ar
<zao> Bleh. HPX doesn't spawn runtime threads on SMTs on my machine out of the box. Time to figure out how to configure the runtime.
<zao> I guess I'll just jam in hpx.os_threads on the command line somehow.
Hashmi has joined #ste||ar
nikunj97 has quit [Ping timeout: 260 seconds]
<zao> With great command line arguments comes great pessimisation... init time went from 86s to 134s when leveraging all threads :D
<diehlpk_work> hkaiser, see pm, please
<hkaiser> diehlpk_work: ok
<diehlpk_work> hkaiser, please reply to diehlpk_work since I use a different device
nan11 has quit [Remote host closed the connection]
bita has joined #ste||ar
nan11 has joined #ste||ar
Hashmi has quit [Quit: Connection closed for inactivity]
rtohid has quit [Remote host closed the connection]
weilewei has quit [Remote host closed the connection]
weilewei has joined #ste||ar
K-ballo has quit [Remote host closed the connection]
K-ballo has joined #ste||ar
hkaiser has quit [Ping timeout: 265 seconds]
bita_ has joined #ste||ar
bita has quit [Ping timeout: 260 seconds]
nan11 has quit [Remote host closed the connection]
nan11 has joined #ste||ar
rtohid has joined #ste||ar
<weilewei> I would like to have a reduce operation across ranks in the following manner, say, rank 0 allocates a vector of size 10 and computes values if index ranges from 0 to 4, and rank 1 allocates a vector of size 5 and computes values of index ranges from 5 to 9. And rank 1's vector will be reduced to rank 0's vector starting from index 5 to 9
<bita_> I have an hpx question: for some part of my test, when I am using the verbose flag, I see that my "base command" is <unknown>. Any ideas?
<weilewei> is there any mpi operation that is available? the rank numbers could be large, not just 2 ranks.
hkaiser has joined #ste||ar
<weilewei> maybe, shall I use MPI_Reduce, and then offset recvbuf pointer in non-root rank?
stmatengss has joined #ste||ar
weilewei has quit [Remote host closed the connection]
<bita_> hkaiser, I have an hpx question: for some part of my test, when I am using the verbose flag, I see that my "base command" is <unknown>. Any advice?
weilewei has joined #ste||ar
stmatengss has left #ste||ar [#ste||ar]
<hkaiser> bita_: not sure what you're referring to
<bita_> and using ctest -V, I see the base command is: 177: Base command is "/phylanx/build/bin/retile_6_loc_test --hpx:ini=hpx.parcel.tcp.enable=1 --hpx:threads=1 --hpx:localities=6"
<bita_> Adding test_retile_6loc_2d_0, the base command is becoming <unknown>
<hkaiser> hmmm
<hkaiser> where do you see that? circleci?
<bita_> no I build that on docker and msvc
<bita_> in MSVC one of the cores have and <unknown> in it
<hkaiser> bita_: how can I reproduce this?
<bita_> Of course on docker it times out after 200 seconds, and I have seen hangs on windows 1 out of 5 times
weilewei has quit [Remote host closed the connection]
<bita_> I think so, the branch is add_retiling
<hkaiser> bita_: how?
weilewei has joined #ste||ar
<bita_> On Phylanx's add_retiling branch, I use cmake --build /phylanx/build --target tests.unit.plugins.dist_matrixops.retile_6_loc, and then ctest -V -R tests.unit.plugins.dist_matrixops.distributed.tcp.retile_6_loc
<bita_> how can I make an issue, when it needs the changes in a branch?
<bita_> It is now running here on circleCI, https://circleci.com/workflow-run/c0261b1d-5910-4e88-9e2f-6e1b146d4857, maybe we can wait to see if we see the same error there
<hkaiser> ok
<hkaiser> should have the same error
<bita_> nod
nikunj97 has joined #ste||ar
<bita_> hkaiser, here it is: https://circleci.com/gh/STEllAR-GROUP/phylanx/35802 An unknown is printed
<hkaiser> bita_: ok, thanks
<hkaiser> no idea what's causing this
<hkaiser> moreover as the 2 and 3 locality caes pass ok
<bita_> got it, thanks
<nikunj97> hkaiser, yt?
rtohid has left #ste||ar [#ste||ar]