hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
maxwellr96 has quit [Ping timeout: 264 seconds]
K-ballo has quit [Quit: K-ballo]
<nikunj97> hkaiser, see pm please
hkaiser has quit [Quit: bye]
nikunj97 has quit [Ping timeout: 240 seconds]
nikunj has joined #ste||ar
quaz0r has quit [Ping timeout: 246 seconds]
quaz0r has joined #ste||ar
jaafar has quit [Ping timeout: 268 seconds]
david_pfander has joined #ste||ar
K-ballo has joined #ste||ar
<heller_> simbergm: you know what's pretty cool as well?
<jbjnr_> diehlpk: yes. I can be a co-organiser again for gsoc
<jbjnr_> I think we should delete all the old projects that are no longer valid for a start.
<jbjnr_> happy new year everyone
<simbergm> heller_: that is nice
<simbergm> jbjnr_: happy new year to you too
<jbjnr_> heller_: simbergm zao K-ballo david_pfander nikunj ^^^
<heller_> jbjnr_: happy new year!
<nikunj> happy new year!!
<zao> jbjnr_: \o/
<jbjnr_> wow. people are actually here and responsing. I did not look at IRC for 2 weeks almost
<heller_> jbjnr_: what did you expect :)?
<jbjnr_> just static noise really
quaz0r has quit [Ping timeout: 258 seconds]
quaz0r has joined #ste||ar
nikunj has quit [Quit: Leaving]
* K-ballo responds too
hkaiser has joined #ste||ar
quaz0r has quit [Ping timeout: 258 seconds]
<heller_> great ... the only blocking issue with the gitlab runners is a bug in hwloc :/
<heller_> * L3 (cpuset 0x00001000,0x02100002) intersects with NUMANode (P#0 cpuset 0x00001111,0x11111111 nodeset 0x00000001) without inclusion!
<hkaiser> heller_: I'd be happy have our weekly conversation later today, 2pm still ok?
<hkaiser> (9pm your time)
<zao> heller_: We have that on our Bulldozer AMD machines.
<zao> Completely benign there.
<zao> (completely benign on our systems, that is)
<heller_> zao: that's on an AMD EPYC
<heller_> using ubuntu 16.something
<heller_> hkaiser: perfect!
<zao> heller_: I believe that it _may_ be kernel/fw related and upgrading to bionic may address it.
<zao> Have to double-check it.
<heller_> ok
<heller_> let me check if bionic is availble...
<heller_> bionic is 18.04, right?
<zao> Yes.
<zao> heller_: I checked with our local guru. I believe for us it's both firmware and kernel dependent.
<heller_> ok
<zao> He mentioned that Brice Goglin of hwloc sat on the precise knowledge of this thing.
<heller_> well, i'll just ignore it for the time being
<heller_> this is the problem though
<zao> Is that us asserting?
<heller_> yes
<heller_> we get the cpu mask for the memory location and check if it has the bit set for the current cpu
<heller_> now, it could be a benign assert, since the task might have been migrated
<heller_> if I am not completely mistaken
aserio has joined #ste||ar
<heller_> hm, no... this should not happen with that example as cross numa stealing has been turned of...
<jbjnr_> heller_: for our cholesky stuff we have a numa allocator that is different from the one in hpx at the moment. I am in the process of overhauling it and writing a test for it. I could upgrade the transpose test to use ours (and use RP etc) and possibly fix a problem?
<heller_> possibly ;)
<jbjnr_> is it an actual hwloc problem you've got?
<heller_> not sure if it is hwloc related or coming from us
<jbjnr_> it's amd specific though?
<zao> I don't get those kinds of topology problems on my single-socket Ryzen at home.
<heller_> this is a single socket socket AMD EPYC 7401P
<heller_> with 4 CCX
quaz0r has joined #ste||ar
<heller_> interestingly enough: hwloc2 segfaults
<jbjnr_> you have probly seen it already
<heller_> jbjnr_: no, thanks
<mdiers_> hi and happy new year everyone
<mdiers_> heller_: I adapt our software to AMD and have here also a 7401P and 2700X for tests. HPX 1.2 release with hwloc2.
<heller_> mdiers_: happy new year
<heller_> mdiers_: interesting. https://gist.github.com/sithhell/6efea0f269652ce9f2cc7f8a6b557566 <-- that's what I get
hkaiser has quit [Quit: bye]
<mdiers_> heller_: one moment please some login problems, ...
<zao> heller_: Good news! hwloc-2.0.3 segfaults on my Bulldozer.
<heller_> zao: yay
<zao> #2 hwloc_bitmap_copy (dst=0x62c610, src=0x63a7b0) at bitmap.c:240
<zao> transpose_block_numa doesn't assert (hwloc 1.11.10 IIRC).
<mdiers_> heller_: added some text output, graphic from lstopo is also visible
<heller_> zao: there's something in the changelog, IIRC
<heller_> mdiers_: which kernel do you use?
<mdiers_> heller_: openSUSE Leap 15.0 with 4.19.8-2.gf931328-default
<heller_> there you go
<heller_> "There was a bug regarding L3 caches on 24-core Epyc processors which has been fixed in 4.14 and backported
<heller_> in 4.13.x (and maybe in distro kernels too)."
<zao> terminate called after throwing an instance of 'hpx::detail::exception_with_info<hpx::exception>'
<zao> what(): the upper limit given is larger than the number of existing resources: HPX(bad_parameter)
<zao> This is odd.
<zao> srun:ing that example with 4 HPX threads and 4 transpose threads, with a -c4 -n1 -N1 allocation.
<zao> -c12 works.
<zao> Oh well, time to resume work.
hkaiser has joined #ste||ar
<mdiers_> heller_: interesting. We had many problems during the installation of the distro(s), some with the board, other with the epyc and some more with the vega.
<heller_> ok
<heller_> this is a cloud server ... I haven't actually set up anything except docker-machine to launch stuff ;)
<diehlpk> Should we keep this, update it , or remove it from the wiki?
<diehlpk> * Please provide your email address, home mailing address, and phone number. This is a requirement and provides for accountability on both your side and ours.
<diehlpk> Why do we ask this for?
<diehlpk> Do we really need this information for GSoC?
<diehlpk> I would prefer to not ask for it.
<jbjnr_> I don't think the known issues does any harm, assuming it is still a known issue
<jbjnr_> regarding address details. Does google know who they are? if google are vetting students for applicability then I'm happy to not ask for it
<jbjnr_> diehlpk: ^
<jbjnr_> if gogle have cleared them as legit candidates, all's well, but I'd like to know name, insitution of study etc.
<jbjnr_> it's a form of job application so I'm not sure why you are concerned.
david_pfander has quit [Ping timeout: 258 seconds]
<zao> Anti-discrimination of some sort?
<jbjnr_> Maybe
<diehlpk> jbjnr_, I am not sure why we need their home address
<diehlpk> I think e-mail and phone number is sufficient or?
<jbjnr_> remove the request if you like. I've never checked on them.
<diehlpk> heller_, What is your opinion?
bibek has quit [Quit: Konversation terminated!]
bibek has joined #ste||ar
jaafar has joined #ste||ar
jaafar has quit [Ping timeout: 252 seconds]
aserio has quit [Ping timeout: 250 seconds]
nikunj has joined #ste||ar
aserio has joined #ste||ar
<heller_> diehlpk: I don't care much ;)
<heller_> diehlpk: I never used that information. And I wouldn't know what for in any case
devang_2401 has joined #ste||ar
devang_2401_ has joined #ste||ar
<devang_2401_> hi I'm Devang.
<heller_> Hi Devang
devang_2401 has quit [Ping timeout: 256 seconds]
<devang_2401_> i am a gsoc aspirant. i wanted to work on some beginner level project to work on.
parsa[w] has quit [Read error: Connection reset by peer]
<devang_2401_> i wanted to work on some beginner level projects*
<nikunj> devang_2401_, to begin with, you may want to look at the tickets raised on github. For gsoc project ideas, you may have a look at https://github.com/STEllAR-GROUP/hpx/wiki/GSoC-2019-Project-Ideas
<heller_> ok, what's your background? how fluently do you speak C++, do you prefer other languages?
<devang_2401_> c++ is my main language
<devang_2401_> i just finished my first semester in college.
parsa[w] has joined #ste||ar
<heller_> diehlpk: updated the ideas page. Looks good to me
<heller_> devang_2401_: ok, the first project would be to get a working HPX build
<heller_> https://github.com/STEllAR-GROUP/hpx/issues/3175 <-- this might be easy enough?
parsa[w] has quit [Read error: Connection reset by peer]
jaafar has joined #ste||ar
<devang_2401_> thanks, i'll install the library and try the problem.
<diehlpk> heller_, Thanks. Will you be a mentor this year?
<heller_> sure, why not
parsa[w] has joined #ste||ar
<heller_> devang_2401_: make it work for you first, play around with one of the examples
<heller_> devang_2401_: and see how it goes
devang_2401_ has quit [Ping timeout: 256 seconds]
devang_2401 has joined #ste||ar
devang_2401 has quit [Client Quit]
<heller_> hkaiser: ready whenever you are
<hkaiser> heller_: sec
aserio has quit [Ping timeout: 252 seconds]
aserio has joined #ste||ar
aserio has quit [Quit: aserio]
aserio has joined #ste||ar
hkaiser has quit [Quit: bye]
RostamLog has joined #ste||ar
aserio has quit [Ping timeout: 260 seconds]
aserio has joined #ste||ar
aserio has quit [Ping timeout: 268 seconds]
aserio has joined #ste||ar
<heller_> zao: I can confirm that bionic fixed the problem for me
<heller_> at least from the hwloc side of things
<heller_> what a nice view: https://i.imgur.com/nCA7a6x.png
RostamLog has joined #ste||ar
aserio1 has joined #ste||ar
aserio has quit [Ping timeout: 252 seconds]
aserio1 is now known as aserio
<heller_> ok, looks like it's a problem with hwloc and thread pinning inside docker containers
<heller_> who would've though
<heller_> woha: time (git clone --depth=1 https://github.com/STEllAR-GROUP/hpx.git && mkdir hpx/build && cd hpx/build && cmake .. -G "Ninja" -DCMAKE_BUILD_TYPE=Debug -DHPX_WITH_MALLOC=system && ninja transpose_block_numa)
<heller_> real 1m2.856s
aserio has quit [Quit: aserio]
aserio has joined #ste||ar
aserio has quit [Quit: aserio]
hkaiser has joined #ste||ar
parsa[w] has quit [Quit: Leaving]
<heller_> running docker in privileged mode solves that issue
<heller_> hkaiser: I fixed the conflict
<hkaiser> heller_: thanks
<heller_> hkaiser: FWIW, the reason why the packet runner is so slow is because clang-tidy is run in parallel to the regular compilation step
jbjnr_ has quit [Ping timeout: 260 seconds]