hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
K-ballo has quit [Quit: K-ballo]
hkaiser has quit [Quit: bye]
mdiers_ has quit [Quit: mdiers_]
mdiers_ has joined #ste||ar
jaafar has quit [Ping timeout: 250 seconds]
<zao> heller: Seems like GitLab:s local runners is getting support for running as specific users. I didn't even know there was runners you could run yourself :D https://gitlab.com/gitlab-org/gitlab-runner/merge_requests/1199#note_166815712
pmikolajczyk41 has joined #ste||ar
<pmikolajczyk41> hi! recently while playing with tests to the hpx::count (https://github.com/STEllAR-GROUP/hpx/blob/master/tests/regressions/parallel/count_3646.cpp) I am afraid I have encountered a bug
<pmikolajczyk41> Everything runs smoothly for small testcases (33 elements are hardcoded), but when moving to e.g. 65 test fail
<pmikolajczyk41> It is not caused by the fix #3646 - I reverted changes to the old version and the same happened
<pmikolajczyk41> When looking deeper with debugger it seems that partitioner makes some mess - there are uninitialized chunks, objects with strange values
<zao> Exciting.
<pmikolajczyk41> I encountered similar problem while repairing #3442
<pmikolajczyk41> Such unitialized memory was the reason for raising #3442 but my fix was to protect source code from using incorrect chunks
<zao> Could you please file an issue with your findings?
<pmikolajczyk41> Of course
pmikolajczyk41 has quit [Quit: Page closed]
arokux has joined #ste||ar
heller has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]
heller has joined #ste||ar
<heller> finally got the barrier failure reproduced...
<heller> but totally strange behavior...
<heller> ahh, got it
<heller> total misinterpretation of the slurm environment :/
<zao> Nice.
<arokux> are there chat logs somewhere?
<simbergm> arokux: http://irclog.cct.lsu.edu/
<simbergm> heller: wee, what's wrong?
<heller> simbergm: I misinterpreted the value of SLURM_STEP_TASKS_PER_NODE
<heller> simbergm: and that value didn't show up before
<simbergm> hmm, I'll wait for the PR ;) seems easily fixable?
<heller> yes
<heller> I think I have it already
<heller> should have been visible on daint as well, actually
<heller> But I think I know why it didn't show up there
<heller> simbergm: do you know if pycicle sets the HPXRUN_RUNWRAPPER=srun variable?
<simbergm> heller: yeah it does
<heller> hmmm
<heller> strange
<mdiers_> it looks like the changes I've already made: https://github.com/m-diers/hpx/tree/fixing_slurm_threads
<simbergm> at least it's meant to
<heller> mdiers_: yes
<heller> I think so
<heller> mdiers_: I'll take your commit and submit it ;)
<mdiers_> heller: Don't let me stop you ;-)
<heller> simbergm: the problem shows up if the number of tasks isn't evenly divisable by the number of tasks
<zao> number of tasks and number of tasks? :)
<simbergm> heller: i.e. rostam has 16 core nodes, daint 36, and the barrier test uses 3 localities?
<heller> the last number of tasks should be number of cores ;)
<heller> simbergm: how many localities do you salloc on daint for the test?
<heller> tests
<simbergm> just one
<heller> ah, that explains things ;)
<simbergm> just one node, that is
<heller> on rostam, we salloc 2
<simbergm> yep
<heller> and then the barrier tests requests 3 tasks
<heller> so the three tasks need to be distributed on two nodes
<heller> which leads to the problem
<heller> mdiers_ patch goes a little beyond that ...
<heller> so the actual problem was there since forever ...
<mdiers_> heller: Yeah, I was expecting that. Because of the fundamental changes, I hadn't created a PR yet.
pmikolajczyk41 has joined #ste||ar
pmikolajczyk41 has quit [Quit: Page closed]
jgolinowski has joined #ste||ar
K-ballo has joined #ste||ar
eschnett has quit [Quit: eschnett]
hkaiser has joined #ste||ar
bibek has quit [Quit: Konversation terminated!]
bibek has joined #ste||ar
<parsa> simbergm: ping
<simbergm> parsa: pong
<parsa> seems we have a viable potential gsod candidate
<parsa> they've got background, but they've asked for stuff to read
<parsa> not sure what to tell them since they're not programmers
<hkaiser> read our docs?
<hkaiser> watch videos, read papers (http://stellar-group.org/publications/)
<parsa> hkaiser: would it not sound like nonsense without c++ background?
<hkaiser> they will ask if things are not clear
eschnett has joined #ste||ar
<simbergm> we should try to get them on here (or slack or the mailing list)
<hkaiser> simbergm: see #3841, not sure if this needs fixing before the release...
<simbergm> I think they should be able to run a hello world
<simbergm> even if they don't understand everything and even if it means we have to help them quite a bit
<simbergm> otherwise all they can do is fix typos... (maybe)
<simbergm> hkaiser: yeah, most likely it will need fixing
<simbergm> you
<hkaiser> :/
<simbergm> 're not on 2019 I suppose?
<hkaiser> not yet, but I can move over
<hkaiser> probably should...
<simbergm> how new is it?
<hkaiser> 1 month
<simbergm> we can still wait as well
<hkaiser> ok
<simbergm> we can't be expected to find workarounds for msvc bugs right away ;)
<hkaiser> I'll see what I can do
<simbergm> ok, thanks, don't spend too much time on it if it looks bad
<hkaiser> might be as simple as moving the decltype() into a trailing return type
<simbergm> maybe ms will even fix it in a patch release soon?
<hkaiser> lol
<hkaiser> gtg
<simbergm> ok, apparently not...
hkaiser has quit [Quit: bye]
<simbergm> sure
<K-ballo> what's wrong with 2019 (now)?
<simbergm> once introduced, they keep these bugs for backwards compatibility?
<simbergm> K-ballo: #3841
<K-ballo> isn't that the one I fixed a while ago?
<K-ballo> I fixed that one right away, I did not see any other like it back then
<K-ballo> they are very aggressive with "fixes" and "regressions" nowadays, far from stable and backwards compatible, it's a bit of a mess
aserio has joined #ste||ar
<simbergm> K-ballo: indeed, you're on top of things
<simbergm> looks like that person used 1.2.1
<aserio> simbergm: I am about to forward an email to you that you may have already received
<aserio> Let me know if if is spam
<aserio> or if you need me to allow it through the Contact mailing list
<simbergm> aserio: ah thanks, that's ok
<simbergm> doesn't need to go anywhere further
<aserio> simbergm: thanks!
jgolinowski has quit [Ping timeout: 264 seconds]
jgolinowski has joined #ste||ar
<jgolinowski> Hello
<jgolinowski> I am trying to build HPX using default allocator, i.e. tcmalloc
<jgolinowski> For that I am using the gperftools
<jgolinowski> When I simply follow the steps from the documentation I get the following error:
<jgolinowski> -- Could NOT find TCMalloc (missing: TCMALLOC_LIBRARY TCMALLOC_INCLUDE_DIR)
<jgolinowski> CMake Error at cmake/HPX_Message.cmake:43 (message):
<jgolinowski> ERROR: HPX_WITH_MALLOC was set to tcmalloc, but tcmalloc could not be
<jgolinowski> tbbmalloc, and custom
<jgolinowski> found. Valid options for HPX_WITH_MALLOC are: system, tcmalloc, jemalloc,
<jgolinowski> Call Stack (most recent call first):
<jgolinowski> cmake/HPX_SetupAllocator.cmake:29 (hpx_error)
<jgolinowski> CMakeLists.txt:1503 (include)
<jgolinowski> -- Configuring incomplete, errors occurred!
<jgolinowski>
<jgolinowski> I have already built the tcmalloc library and can provide the include path. However, I am not sure what should I provide as the TCMALLOC_LIBRARY cmake variable
<K-ballo> from a quick glance at the find module, looks like you are expected to just set `TCMALLOC_ROOT`
<K-ballo> as either cmake variable or environment variable
<jgolinowski> hmm
<jgolinowski> ok so the error message does not suggest that
<jgolinowski> it indicates that the 2 cmake variables are missing: TCMALLOC_LIBRARY and TCMALLOC_INCLUDE_DIR
<jgolinowski> it seemed to have worked when I used:
<K-ballo> those variables come from cmake's find_package, and are the ones that caused the find to fail
<jgolinowski> -DTCMALLOC_INCLUDE_DIR=/home/jakub/packages/gperftools/include -DTCMALLOC_LIBRARY=/home/jakub/packages/gperftools/lib/libtcmalloc.so
<K-ballo> you are not actually expected to set those manually
<K-ballo> you never manually set the variables that are defined as outputs from a find module
<K-ballo> try setting `TCMALLOC_ROOT` on a clean cache, that should work.. then file a ticket for the bad error message
<jgolinowski> so shouldn't the error message spell out the TCMALLOC_ROOT instead of the 2 more inner
<jgolinowski> ah ok
<K-ballo> the 2 more inner one come from cmake itself, there's no hiding those
<K-ballo> the other error message comes from us, and should mention TCMALLOC_ROOT
<jgolinowski> ok
<jgolinowski> also, do you think there should be some update to documentation about this?
<K-ballo> maybe? what do the docs say about tcmalloc? I'm not personally familiar with any of these aspects
<jgolinowski> so first of all the dissonance I find is with the fact that the gperf tool is only an optional dependency but at the same time tcmalloc is set as the default allocator
<jgolinowski> 2nd the cmake variable TCMALLOC_ROOT is *not listed* as one of the "CMake variables used to configure HPX"
<K-ballo> it's not a variable used to configure HPX
<K-ballo> `<module>_ROOT` as cmake or env var, while not mandated, is the traditional find module approach
aserio1 has joined #ste||ar
<K-ballo> my vague recollection wrt tcmalloc being optional but default is that the performance difference was considerable enough to have users explicitly opt-out.. someone else may correct me if I am misremembering
aserio has quit [Ping timeout: 264 seconds]
aserio1 is now known as aserio
<jgolinowski> Ok, it is clear to me now
<K-ballo> seems we lost that descriptive error message? you only got the default one
hkaiser has joined #ste||ar
aserio has quit [Ping timeout: 264 seconds]
aserio has joined #ste||ar
jaafar has joined #ste||ar
jaafar has quit [Ping timeout: 248 seconds]
<diehlpk_work> simbergm, RC1 compiled for all architecutres on Fedora
jaafar_ has joined #ste||ar
<diehlpk_work> simbergm, Did anyone of the two LSU students contacted you?
aserio has quit [Ping timeout: 248 seconds]
jaafar_ has quit [Ping timeout: 252 seconds]
<simbergm> diehlpk_work: great!
<simbergm> and yes, two students contacted me
<diehlpk_work> Cool
<simbergm> I didn't reply since you said you wrote back to them
<diehlpk_work> Ok, did the ask the same questions?
<simbergm> did you try to get them to come on here? ;)
<diehlpk_work> Yes, the English department advertised us to the graduate students
<diehlpk_work> simbergm, I forwarded you the advertisement
<diehlpk_work> I think you could wrtiem them again and ask if they have questions about the projects
<simbergm> all right, I'll do that
jgolinowski has quit [Ping timeout: 252 seconds]
jaafar_ has joined #ste||ar
jgolinowski has joined #ste||ar
arokux has quit [Ping timeout: 256 seconds]
<heller> simbergm: the launch_process test looks different
<heller> simbergm: could it be that there is some problem with the recent changes regarding turning of networking by default when starting up outside a batch environment (or with just one locality)?
<jgolinowski> K-ballo, I re-run my initial command and I see the same error that the one posted couple of hours before
<K-ballo> ok, that's good, so we didn't drop it
<K-ballo> oh no, I misread
<jgolinowski> the error you linked from line 13 is not shown because HPX_WITH_MALLOC is set by default I believe
<K-ballo> both were posted before, which one do you mean jgolinowski?
<K-ballo> ok
<jgolinowski> I only see the error from line 18
<jgolinowski> I believe the text from lines 14-15 should be also displayed in my case, i.e. it should be copied to the else statement
aserio has joined #ste||ar
jaafar_ has quit [Ping timeout: 246 seconds]
aserio has quit [Ping timeout: 252 seconds]
jaafar_ has joined #ste||ar
jaafar_ has quit [Ping timeout: 252 seconds]
<hkaiser> heller: sounds like it
<hkaiser> I'll have a look
<hkaiser> heller: pls see #3843
<hkaiser> you were right
jaafar has joined #ste||ar
<heller> hkaiser: perfect, this should make is greener again
<heller> hkaiser: do you want to talk tomorrow?
hkaiser has quit [Ping timeout: 248 seconds]
aserio has joined #ste||ar
hkaiser has joined #ste||ar
<hkaiser> heller: I won't have time tomorrow, travelling all day
<hkaiser> sorry...
<heller> Sure, no problem
<heller> Have a safe trip!
<hkaiser> tks
hkaiser has quit [Quit: bye]
diehlpk has joined #ste||ar
<simbergm> hkaiser, heller, you guys are awesome
hkaiser has joined #ste||ar
eschnett has quit [Quit: eschnett]
diehlpk has quit [Ping timeout: 248 seconds]
<aserio> hkaiser: will you be reading the hpxmp paper tonight?
aserio has quit [Quit: aserio]
<hkaiser> aserio: yes, that's the plan
jgolinowski has quit [Ping timeout: 246 seconds]
jaafar has quit [Ping timeout: 245 seconds]
jaafar has joined #ste||ar
<hkaiser> diehlpk_work: yt?
nikunj has joined #ste||ar
<nikunj> hkaiser, yt?
<hkaiser> here
<nikunj> see pm pls
jgolinowski has joined #ste||ar
eschnett has joined #ste||ar
jaafar has quit [Ping timeout: 245 seconds]
daissgr has joined #ste||ar
jaafar has joined #ste||ar
nikunj has quit [Quit: Leaving]