#ste||ar on 2020-01-29 — irc logs at irclog.cct.lsu.edu

2019-12-03 02:04 hkaiser changed the topic of #ste||ar to: The topic is 'STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/

00:25 <primef1> hkaiser: last message I got from you is "primef: (#3829)"

00:25 <hkaiser> primef1: well you can do both

00:25 <primef1> Strangely, the IRC logs seem off, as they are missing a couple of messages

00:25 <hkaiser> PRs are tested by the testing infrastructure, but I'd make sure that it does what it should before creating the PR locally

00:25 <hkaiser> but I never run the full test suite locally, only the relevant parts

00:25 <hkaiser> primef: also, there is already a PR that implemented #3646 for two algorithms, but that one fails the testing - so it might be a good start to figure out what's wrong with it

00:26 <primef1> Ok, yes I saw those sorry.

00:26 <primef1> On those I replied: hkaiser: alright, I'll look into #3829. Saw it yesterday.

00:26 <primef1> About the testing, any hint on how to run the test suite? Just compile the single .cpp files and run them?

00:26 <hkaiser> ok

00:26 <primef1> Not sure they went out

00:27 <hkaiser> primef1: well, make tests.* will do the trick, where tetss.* is the target name of the test to run

00:27 <hkaiser> I think make can autocomplete the target names

00:27 <primef1> alright, then I'll experiment on that. Thank you!

00:28 <primef1> Now I'll go offline though as it's sleep-time :-)

00:28 <primef1> See you tomorrow, have a good day!

00:29 mdiers_1 has joined #ste||ar

00:31 mdiers_ has quit [Ping timeout: 260 seconds]

00:31 mdiers_1 is now known as mdiers_

00:35 primef1 has quit [Ping timeout: 268 seconds]

01:55 K-ballo1 has joined #ste||ar

01:55 K-ballo has quit [Quit: K-ballo]

01:55 K-ballo1 is now known as K-ballo

03:39 hkaiser has quit [Quit: bye]

07:48 rori has joined #ste||ar

08:47 rori has quit [Ping timeout: 260 seconds]

08:47 jbjnr has quit [Ping timeout: 265 seconds]

08:59 jbjnr has joined #ste||ar

12:02 <simbergm> hkaiser, heller, jbjnr, you should have been invited to the survey

12:02 <simbergm> let me know if you didn't get it or it's the wrong email

12:02 <heller2> ms: yup, already looked at it. not bad

12:02 <heller2> I wonder if I was the unfriendly german guy on IRC :/

12:06 <K-ballo> oh no, who let niall participate?

12:11 <simbergm> niall who?

12:19 <K-ballo> douglas, an old antagonist of the "german hpc posse"

12:19 <heller2> yeah ...

12:48 jbjnr1 has joined #ste||ar

12:49 <jbjnr1> This is the real jbjnr trying a test from matrix

12:49 <jbjnr> This is the old one using the windows machine

12:49 <jbjnr1> Connecting to here from matrix was much harder than it should have been. Thanks heller

12:55 Yorlik has quit [Ping timeout: 268 seconds]

12:59 <heller2> yup, the whole registration business is a pain

13:00 <jbjnr> This is another test whilst the matrix thingy is shut down, to see if it still appears when I restart it

13:03 <jbjnr1> that's great. The matrix hingy tracks the messages I missed :)

13:04 <simbergm> for users that might be wondering what this is about, we're testing matrix as an alternative to irc

13:04 <simbergm> there's an hpx channel on matrix.org: https://riot.im/app/#/room/#hpx:matrix.org

13:04 <jbjnr1> we have no users.

13:05 <simbergm> users, speak up! disprove jbjnr

13:05 <simbergm> I bet noone is going to respond...

13:05 <zao> I’ve used HPX once!

13:05 <jbjnr1> <applause>

13:05 <simbergm> there we go, we have user

13:12 <K-ballo> maybe all the users are in that matrix thing

13:12 <jbjnr> they're not!

13:12 hkaiser has joined #ste||ar

13:30 <simbergm> btw, hkaiser, heller, K-ballo feel free to fill in the survey as well if you haven't already

13:30 <heller2> I did ;)

13:30 <simbergm> good :)

13:31 <hkaiser> I did too

13:31 <hkaiser> (I think)

13:36 <hkaiser> simbergm: thanks for the governance comments

13:36 <simbergm> hkaiser: I'll pass the thanks on to joost ;)

13:37 <hkaiser> also, how do I see the survey results?

13:46 primef1 has joined #ste||ar

13:46 <simbergm> hkaiser: did you get some sort of email about it?

13:47 <hkaiser> yes, but that just opens the survey itself for me

13:47 <simbergm> assuming you have access to it, there should be a responses tab next to the questions tab

13:50 <K-ballo> why is the doc rating question mandatory?

13:50 <hkaiser> ahh, got it

14:00 <simbergm> K-ballo: mistake, shouldn't be mandatory anymroe

14:16 rori has joined #ste||ar

14:40 hkaiser has quit [Quit: bye]

14:59 <simbergm> hkaiser, heller would it be possible to move the meeting tomorrow to one hour later? i.e. 17:00 CET/10:00 CT? (still need to confirm if it's needed)

14:59 <heller2> should be fine

15:06 hkaiser has joined #ste||ar

15:10 <primef1> simbergm: do you have any more suggestions, about any flags to activate to increase performance on distributed/numa aware applications?

15:10 <jbjnr> primef1: what is your problem/task?

15:11 <primef1> Moreover in one of your examples, you use "hpx.numa_sensitive=2". What does the 2 stand for? And how is this different from setting --hpx:numa-sensitive on the CLI

15:11 <jbjnr> That setting is obsolete really.

15:12 <primef1> I have to reproduce mutliple algorithms and optimize those. The algorithms are: reduce, transpose and at last a 3d stencil.

15:12 <jbjnr> it says that a scheduler can steal work from another numa domain, but only the core adjacent can do it

15:12 <primef1> jbjnr:

15:12 <jbjnr> in reality it is not suppported any more

15:12 <simbergm> primef1: jbjnr is our numa expert, if he has time I'll let him handle this :)

15:13 <primef1> jbjnr: good to know, then I'll remove it from my code. I took it from the official examples.

15:13 <jbjnr> I don't want to take on a whole ditributed matrix transpose problem right now, but maybe somethig smaller

15:13 <primef1> simbergm: thanks!

15:14 <jbjnr> simbergm: wtf is all this spellcheck fail stuff on the dashboard/emails

15:14 <jbjnr> if we have to give our vars real names from now on, then shoot me now

15:15 <simbergm> jbjnr: it should ignore most nonsense variable names

15:15 <primef1> jbjnr: Sure I can understand that, actually the code we wrote is finished. We are not achieving the results we want, but time is running and we don't have that much time to dedicate to more optimizations. We tried to implement numa distribution, using the numa_allocator, but honestly it seems not performing that good.

15:15 <simbergm> if it turns out to give too many false positives we'll remove it

15:16 <primef1> jbjnr: so instead of using the "hpx.numa_sensitive=2" option, should I set the CLI argument for numa-sensitiviness?

15:16 <simbergm> or I'll limit it to docs (but I'd like to have it check comments in code at least)

15:18 <jbjnr> primef1: can you perhaps tell me what the code does?

15:18 <jbjnr> or better still let me see it.

15:19 <primef1> Sure! Give me a sec.

15:19 <jbjnr> in general we have a numa allocator (2 actually, a old one and a new one), they allow you to allocate dat on a particular numa domain, or striped in some way. However, the schedulers need to know where the data is in order to allocate tasks to cores on that numa node

15:20 <jbjnr> the shared_priority_scheduler has some new support for this kind of thing

15:20 <jbjnr> but it is still a bit experimental

15:20 <jbjnr> so know what you are doing is useful

15:20 <primef1> https://gist.github.com/PrimeF/62060929b6160f3e849545c1009e755a

15:20 <primef1> Ok, good to know that. The code I sent is an adapted version of the examples you use.

15:20 <jbjnr> is this the transpose code?

15:21 <primef1> Yes it is. I understood you don't want to look into it. But it is the only one, where we actually use numa, right now.

15:21 <jbjnr> what do you mean by "time is running out" - what deadline must you meet?

15:22 <primef1> It's a university project. Deadline is next week

15:23 <jbjnr> monday next week, or friday next week?

15:24 <jbjnr> and how bad is your performance?

15:25 <jbjnr> ...and does it need to work distributed - or just on one node with N numa domains

15:25 <primef1> thursday next week. performance does actually worsen based on a version without numa. System is using xeon phi

15:27 <jbjnr> how much trouble will you be in if you can't improve the perform ance before thursday? (meaning, how important is this project)

15:27 <primef1> the more the better. But our primar goal was 1 node N numa domains and introducing node distribution, is a step that looked far for us.

15:29 <primef1> I put it this way, I guess we won't get in trouble. It will just affect our grade. So nothing we need urgent support on. I mean if you look over it once and discover stuff we shouldn't do or how we can improve it it would be great. But nothing you have to invest much of your time on it

15:29 <jbjnr> if you have time - have a play with the test in this location https://github.com/STEllAR-GROUP/hpx/tree/master/tests/unit/topology

15:29 <primef1> thank you so much, for asking though and helping us

15:29 <jbjnr> this shows how to use the new numa allocator to allocate memory using different patters

15:30 <jbjnr> then the guided_pool_executor - or using a schedulehint can tell tasks where to run (which numa node)

15:30 <jbjnr> (schedulehint with an normal executor I mean) - but this stuff is a bit advanced usage and I still have problems with it

15:30 <jbjnr> so what you've got might be the best for the time being

15:31 <jbjnr> How close to peak memory bandwidth do you get?

15:32 <jbjnr> I'm interested in your transpose example, because it is exactly this kind of thing we need to make the numa API simple and easy enough to allow guys like you to use it and write decent code without being an expert.

15:32 <jbjnr> so I'll try to look into it

15:33 <jbjnr> (but I already have 2 other project that have deadlines)

15:33 <primef1> Alright, this all sounds good advice to me. Thanks a lot! I'll look into the pages you sent and try to make my head around it.

15:35 <primef1> To be honest, I don't know what peak memory bandwidth we have. But it's a good point, to check. Sorry for that, we are also quite new in high performance computing stuff

15:35 <jbjnr> the numa test I linked to, allocates a big array and then binds differnt pages to different numa nodes - then dumps out the binding pattern

15:35 <jbjnr> on a KNL with 4 nodes you should see a pattern like 0000111122223333 etc etc for pages or memory

15:35 <jbjnr> ^of

15:35 <jbjnr> 4 numa nodes I meant ^^

15:36 <primef1> But for sure, the proposal to make the numa concept more newbie-friendly sounds amazing. If I might be of any help on that please let me know.

15:36 <jbjnr> now if you know which page of memory is bound to which numa node, you can tell the scheduler to run tasks that use that memory on that node

15:37 <simbergm> primef1: another thing to test is forget about all the explicit numa management and launch one hpx locality per numa node

15:37 <jbjnr> tip - find my c++ italia talk on youtube and skip to the part with the cholesky and guided executor mentioned

15:37 <simbergm> you don't have as much control but it might get you pretty close to the same effect

15:37 <primef1> ohh alright, understood the concept. Thing is for simple arrays/vectors, this doesn't sound difficult. What I think is however how this behaves in combination with custom structs and futures.

15:37 <jbjnr> yes. what simbergm said also is a good (better) way

15:37 <jbjnr> to keep it simple

15:38 <jbjnr> then the operating system manages the memory for you and mpi handles the coms

15:38 <jbjnr> gtg

15:38 <primef1> jbjnr: alright, will look into the video.

15:38 <primef1> simbergm: so you mean start a hpx:main on each numa node? How is that achieved?

15:40 <simbergm> primef1: depends a bit on how you launch the application

15:41 <simbergm> with mpirun I think you can set the number of processes/ranks per node explicitly

15:41 <simbergm> if you use that together with --hpx:use-process-mask you should get correct thread bindings automatically

15:41 <primef1> If that is what you mean, we launch it using srun (slurm)

15:41 <simbergm> even better

15:42 <primef1> Ok, but now a naive question for sure. How do I divide my transpose problem if the application is split even before that?

15:42 <simbergm> just use the parameters that srun takes for multiple ranks per node (-n or -N, man srun will tell you)

15:43 <simbergm> not sure I understand the question... for distributing to multiple nodes you'll also have to distribute your matrix somehow

15:44 <simbergm> now how that is distributed is another question

15:44 <simbergm> I think some sort of round robin distribution is typical, but I don't know what's best for this use case

15:44 <primef1> ok, yes that is what my question was.

15:44 <simbergm> whether you have one or multiple ranks per node is independent of how you distribute your matrix

15:45 <primef1> so inside the application I tell if application is currently on thread x, do this subblock of the matrix

15:45 <primef1> and so on

15:47 <simbergm> more or less

15:47 <simbergm> you'd most likely do it per rank and make sure you oversubscribe (have "too much work") for each rank

15:50 <primef1> alright. Lots of input I can work on and lots of new concepts for me to understand. Thank you so much for your help guys simbergm and jbjnr. Amazing getting so quickly such constructive help.

15:52 <primef1> simbergm: one more question. yesterday, you were talking about using MPI Parcelport. What dependencies are required for it? Is OpenMPI correct?

15:53 <primef1> Or is it MPICH?

15:54 <simbergm> primef1: openMPI will do just fine in most cases I think

15:54 <simbergm> if you run it on a cluster there'll usually be an mpi that's optimized for that system and in those cases you just use that

15:55 <simbergm> mpich and openmpi should both work

15:56 <primef1> the system has available through spack a couple of openmpi packages. but none of them was compiled with gcc 9. so I will have to build openmpi.

16:08 <K-ballo> it finally happened, we are "C++14-only" now

16:10 <hkaiser> yah, now we're not allowed to use anything before C++14 anymore - no raw loops!

16:15 <simbergm> Guys...

16:16 <simbergm> hkaiser I like your thinking, I'll put together an inspect check for that

16:17 <hkaiser> heh

16:17 <hkaiser> simbergm: I have a couple of minor fixes for this for msvc however

16:18 <simbergm> hkaiser I feared as much... Hope it's nothing too bad

16:19 primef1 has quit [Ping timeout: 265 seconds]

16:19 <hkaiser> nah, no worries - minor issues, like printing the used standard twice and such

16:19 <K-ballo> I tried the msvc build last week and it was ok (core only)

16:20 <simbergm> :/

16:20 <simbergm> Thanks

16:21 <hkaiser> right, compilation is fine

16:37 primef1 has joined #ste||ar

16:41 primef1 has quit [Ping timeout: 260 seconds]

16:57 primef1 has joined #ste||ar

17:43 primef1 has quit [Ping timeout: 272 seconds]

18:06 akheir has quit [Quit: Leaving]

18:08 rori has quit [Ping timeout: 246 seconds]

18:19 primef1 has joined #ste||ar

18:27 primef2 has joined #ste||ar

18:29 primef1 has quit [Ping timeout: 245 seconds]

18:54 primef2 has quit [Ping timeout: 260 seconds]

19:06 hkaiser has quit [Quit: bye]

19:08 diehlpk has joined #ste||ar

19:08 <diehlpk> https://koji.fedoraproject.org/koji/taskinfo?taskID=41212782

19:23 hkaiser has joined #ste||ar

19:25 diehlpk has quit [Ping timeout: 260 seconds]

19:29 <diehlpk_work> hkaiser, Which HPX 1.4.0 have you tested in the the Dockerfile?

19:29 <diehlpk_work> https://koji.fedoraproject.org/koji/taskinfo?taskID=41212782

19:29 <diehlpk_work> I did a run today and still see the linking error

19:29 <hkaiser> just pulled from master

19:29 <hkaiser> but that shouldn't change anything

19:30 <diehlpk_work> http://stellar.cct.lsu.edu/files/%{name}_%{version}.tar.gz would you mind to build this one here?

19:31 <hkaiser> diehlpk_work: will do

19:31 <diehlpk_work> I am not sure why we can not built it on the Fedora build system

19:31 <hkaiser> I'm not sure either

20:33 primef2 has joined #ste||ar

20:46 primef3 has joined #ste||ar

20:46 primef2 has quit [Ping timeout: 246 seconds]

22:13 primef3 has quit [Ping timeout: 272 seconds]

22:14 primef3 has joined #ste||ar

23:02 <primef3> Good evening, anyone any hint on this kind of "error what(): this client_base has no valid shared state: HPX(no_state)"?

23:03 maxwellr96 has joined #ste||ar

23:05 <maxwellr96> Am I doing something wrong with configuration which would cause a problem with blaze_tensor compilation like this:

23:05 <maxwellr96> In file included from /home/mreeser/src/repos/blaze_tensor/blaze_tensor/math/CustomTensor.h:47,

23:05 <maxwellr96> from /home/mreeser/src/repos/blaze_tensor/blazetest/src/mathtest/columnslice/DenseGeneralTest.cpp:51:

23:05 <maxwellr96> using Iterator = DenseIterator<Type,AF>; //!< Iterator over non-constant elements.

23:05 <maxwellr96> ^

23:20 <K-ballo> maxwellr96: and what's the problem?

23:20 <maxwellr96> It's blocking compilation of blaze_tensor

23:20 <K-ballo> primef3: looks like it could be a moved-from client handle?

23:20 <K-ballo> maxwellr96: yes, but how? note you have not shared the actual error message

23:21 <maxwellr96> Oh, I'm sorry, you're right

23:21 <maxwellr96> using Iterator = DenseIterator<Type,AF>; //!< Iterator over non-constant elements.

23:21 <maxwellr96> using ConstIterator = DenseIterator<const Type,AF>; //!< Iterator over constant elements.

23:21 <maxwellr96> ^

23:22 <maxwellr96> inline Iterator begin ( size_t i, size_t k ) noexcept;

23:22 <maxwellr96> ^~~~~~~~

23:22 <maxwellr96> Iterator_t

23:22 <maxwellr96> inline ConstIterator begin ( size_t i, size_t k ) const noexcept;

23:22 <maxwellr96> ^~~~~~~~~~~~~

23:22 <maxwellr96> ConstIterator_t

23:22 <maxwellr96> inline ConstIterator cbegin( size_t i, size_t k ) const noexcept;

23:22 <maxwellr96> ^~~~~~~~~~~~~

23:22 <maxwellr96> ConstIterator_t

23:22 <maxwellr96> inline Iterator end ( size_t i, size_t k ) noexcept;

23:22 <maxwellr96> ^~~~~~~~

23:22 <maxwellr96> Iterator_t

23:22 <K-ballo> use a paste site, and share the link here

23:22 <maxwellr96> inline ConstIterator end ( size_t i, size_t k ) const noexcept;

23:22 <maxwellr96> ^~~~~~~~~~~~~

23:22 <maxwellr96> ConstIterator_t

23:22 <maxwellr96> inline ConstIterator cend ( size_t i, size_t k ) const noexcept;

23:22 <maxwellr96> ^~~~~~~~~~~~~

23:22 <maxwellr96> ConstIterator_t

23:22 <K-ballo> also share the full error message, that looks like it's missing stuff

23:23 <maxwellr96> https://gist.github.com/folshost/7b082a201c3c32314868aad6a6000e1a

23:23 <maxwellr96> Yeah, I messed it up, so that's not complete. But on the gist it should be

23:23 <primef3> K-ballo: What does a moved-from client handle refer to?

23:25 <K-ballo> one that you std::move()d it?

23:27 <K-ballo> maxwellr96: where did you get blaze from? I don't see an AlignmentFlag enum

23:28 <maxwellr96> K-ballo: From master

23:29 <K-ballo> ok, I do see one here https://bitbucket.org/blaze-lib/blaze/src/adb49db9017d31c88a96798f344af297cba1211a/blaze/math/AlignmentFlag.h#lines-62, something changed with blaze

23:30 <maxwellr96> Is this something that needs to be fixed in blaze_tensor?

23:31 <K-ballo> the change is not that recent either https://bitbucket.org/blaze-lib/blaze/commits/9193b6c7bf9221a4e80b67ef4a5d0e9980328853

23:31 <maxwellr96> Yeah, it looks like blaze was last updated 5 days ago, and blaze_tensor hasn't been updated since the end of December

23:32 <K-ballo> the change in question is over 2 years old though

23:33 <K-ballo> I'd imagine blaze_tensor would have to use blaze::AlignmentFlag rather than bool in its custom tensor thing

23:33 <maxwellr96> Hmm. I'm getting the same error on the circle ci tests for my phylanx branch, by the way. I just confirmed it on Rostam

23:42 primef3 has quit [Ping timeout: 272 seconds]

23:43 primef3 has joined #ste||ar

23:45 jaafar has quit [Ping timeout: 260 seconds]

23:51 <primef3> K-ballo: ok. Still trying to debug.