aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
Matombo has quit [Remote host closed the connection]
<K-ballo>
there's no license in that file, does that make it public domain?
diehlpk has joined #ste||ar
<hkaiser_>
K-ballo: the whole repo has a license, no?
K-ballo has quit [Quit: K-ballo]
deep-book-gk has joined #ste||ar
deep-book-gk has left #ste||ar [#ste||ar]
hkaiser_ has quit [Quit: bye]
diehlpk has quit [Ping timeout: 240 seconds]
mars0000 has joined #ste||ar
mars0000 has quit [Quit: mars0000]
mars0000 has joined #ste||ar
mars0000 has quit [Ping timeout: 240 seconds]
vamatya has joined #ste||ar
vamatya_ has joined #ste||ar
vamatya has quit [Ping timeout: 246 seconds]
vamatya_ has quit [Ping timeout: 276 seconds]
bikineev has quit [Remote host closed the connection]
<jbjnr>
Is anyone onlibne this morning?
<jbjnr>
^online
<heller>
jbjnr: yes!
<jbjnr>
aha. I've just seen your reply on slack to. I was doing a test to see which would get a reply faster. You blew it by using slack before I tried IRC.
<jbjnr>
So I have to check slack too now :(
bikineev has joined #ste||ar
<heller>
;)
<heller>
just a coincidence
mcopik has joined #ste||ar
<jbjnr>
heller: I've just been asked to confirm that we are both available for the HPX course on 5-6 Oct before they send out the formal announcement. You're still in yes?
<jbjnr>
(You cannot say no)
<heller>
I am still in, yes
<jbjnr>
great. thanks
<jbjnr>
So announcement of course should go out today then
<jbjnr>
Just been talking to people about replacing octotiger
Matombo has joined #ste||ar
<jbjnr>
Announcement just went oot for hpx course
Matombo has quit [Remote host closed the connection]
Matombo has joined #ste||ar
Matombo has quit [Remote host closed the connection]
Matombo has joined #ste||ar
Matombo has quit [Remote host closed the connection]
Matombo has joined #ste||ar
<heller>
jbjnr: replacing octotiger?
<jbjnr>
heller: On the last GB call they told us that dominic was moving to another position and his time on octotiger would be limited. We can continue developing, it, or we can start looking for a new flagship HPX HPC app to push.
<jbjnr>
Since my group is involved in a large project that has just been funded, potentially one of the apps in there might want to be an HPX flagship project.
<jbjnr>
they have written their own task based runtime.
<jbjnr>
They should use hpx instead.
<heller>
jbjnr: sounds good!
<heller>
"SWIFT: Using task-based parallelism, fully asynchronous communication, and graph partition-based domain decomposition for strong scaling on more than 100,000 cores"
<heller>
this sounds like our turf
<jbjnr>
They are also one of the projects that are funded to work on our machine as part of the next big set of projects
<jbjnr>
so ideal from all bureaucratic angles too
<heller>
except mine
<jbjnr>
?
<jbjnr>
well, we could invite them to join this FET project?
<heller>
I have no relation to it except it being yet another cool simulation :)\
<heller>
ha
<heller>
yes
<jbjnr>
then all angles are covered
<jbjnr>
since we need a new application anyway
<heller>
good thinking
<jbjnr>
I need to speak to a chap here at CSCS, but he's on vacation currently, so it may be a few weeks before I do anything about this.
<heller>
the FET thingy needs to be acted upon rather quickly
<heller>
and also: top priority getting the thesis done
Matombo has quit [Ping timeout: 248 seconds]
Matombo has joined #ste||ar
<jbjnr>
How quickly does the FET thing need doing?
<heller>
submission date is 24th september
<jbjnr>
shit, that's soon.
<jbjnr>
ok. I will send some emails out ...
bikineev has quit [Remote host closed the connection]
K-ballo has joined #ste||ar
diehlpk has joined #ste||ar
taeguk[m] has quit [Ping timeout: 258 seconds]
thundergroudon[m has quit [Ping timeout: 240 seconds]
<zao>
FET?
<zao>
Ah, "Future and Emerging Technologies"
<zao>
Only ever knew what the NLA part of NLAFET meant :)
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
diehlpk has quit [Ping timeout: 246 seconds]
eschnett has quit [Quit: eschnett]
eschnett has joined #ste||ar
diehlpk_work has joined #ste||ar
diehlpk_work has quit [Ping timeout: 246 seconds]
diehlpk_work has joined #ste||ar
<hkaiser>
jbjnr: I'm planning to all I can to get new money into octotiger
<hkaiser>
but having more than one HPX flagship application is good as well
<jbjnr>
hkaiser: new money would be great, but without dominic ....
<hkaiser>
with money comes dominic
<jbjnr>
so has he changed depts or something?
<hkaiser>
as said on the call we have 2 proposals in the pipeline
<hkaiser>
not even changed depts, just another group
<hkaiser>
jbjnr: and there is still that big project looming where a postdoc is a marginal expense
<jbjnr>
ok. when he said "new boss" I assumed it meant new dept or something. Why doesn't his new boss want him to work on octobaby
<hkaiser>
shrug, have not talked to him, but planning to
<jbjnr>
when will I be able to tell my bosses that hpx has new funding = big project?
<hkaiser>
well, we have got the promise ;)
<jbjnr>
$$$ > promise
<hkaiser>
indeed!
<zao>
Watch out, the future may throw!
<hkaiser>
lol
<hkaiser>
anyways
<hkaiser>
gtg
hkaiser has quit [Quit: bye]
mcopik has quit [Ping timeout: 255 seconds]
aserio has joined #ste||ar
<diehlpk_work>
aserio, yt?
<aserio>
Yes
<aserio>
diehlpk_work: ^^
<diehlpk_work>
We set up a skype meeting today wiht Hartmut or?
<aserio>
yes
<aserio>
but that will take place in an hour
<diehlpk_work>
At 10 my time?
<diehlpk_work>
Ok, my fault. I had 10 my time in my calendar
<jbjnr>
heller: yt?
<heller>
jbjnr: yes
<jbjnr>
quick question ...
<heller>
shoot
<jbjnr>
I've noticed something strange - when I dump out the topology info (on the resource partitioner branch, so possibly dodgy), it shows the correct information. 72 pus on 36 cores, 2 numa domains. etc. Later when the code is running, I dump out the topology info and it tells me 24 core, 2 domains, 48 pus.
<jbjnr>
any idea why hwloc might 'change it's mind' mid way though a program?
<heller>
strange indeed
<jbjnr>
yes indeedy
<heller>
what happens in between those two calls?
<jbjnr>
some matrix stuff
<jbjnr>
nothing earth shattering.
<jbjnr>
I'm looking for numa related issues and found this oddity
<heller>
are the information outputted using the same functionality? Or different code to print that information?
<jbjnr>
same everything. even same 'this' pointer on the topo class
<heller>
72 PUs, you might want to check your HPX_MAX_CPU_COUNT cmake variable
<jbjnr>
set to 96, but tried 256 previously
<heller>
ok
<heller>
no idea why it changed its mind ...
hkaiser has joined #ste||ar
<heller>
could it be that the RP changes some internal, global bitmasks?
<jbjnr>
I'll rebuild hwloc just in case a new vresion fixed anything
<hkaiser>
heller: what happens ?
<heller>
[16:18:29] <jbjnr> I've noticed something strange - when I dump out the topology info (on the resource partitioner branch, so possibly dodgy), it shows the correct information. 72 pus on 36 cores, 2 numa domains. etc. Later when the code is running, I dump out the topology info and it tells me 24 core, 2 domains, 48 pus.
<jbjnr>
well it does, but nothing that would casue this. I've been poking around at it over the weekend
<heller>
ok, that's the only plausible explanation I have, that you somehow mess with the bitmasks
<jbjnr>
yes. I was looking for changes that affect the thread id, pu_masks everything else, but have not uncovered anything unusual
<hkaiser>
jbjnr: I might have screwed up
<jbjnr>
lets blame heller anyway though
<hkaiser>
ok
<jbjnr>
more likele me than you
<jbjnr>
though
<jbjnr>
hkaiser: if there's a comit where you made hwloc related changes, please let me know and I'll have a look
<hkaiser>
none, afair
<heller>
blaming me always works
<hkaiser>
how to reproduce this?
<jbjnr>
hkaiser: not easy. I found it by accident whilst dumping info out from the matrix code. looking for reasons why my numa related changes didn't help. Turned out to be something else, but this is bothering me
<hkaiser>
well, tell me how to reproduce and I'll look into it for you ;)
<heller>
woah, why does everything I touch lead to excessive compile times?
<hkaiser>
the world wants you not to procrastinate
<heller>
a full compilation of my thesis takes about 2 minutes :/
<jbjnr>
heller: sing to the tune of beyonce - "if you like it then you'd better put a template on it"
<heller>
:D
thundergroudon[m has joined #ste||ar
<K-ballo>
the heller's touch
<jbjnr>
correct
<K-ballo>
heh, the irony...
<zao>
Simply defer everything until modules or the heat death of the universe.
<github>
[hpx] hkaiser closed pull request #2788: Adapt parallel::is_heap and parallel::is_heap_until to Ranges TS. (master...tg_is_heap_range) https://git.io/v7lCt
aserio has quit [Ping timeout: 246 seconds]
diehlpk has joined #ste||ar
Reazul has quit [Quit: Page closed]
Reazul has joined #ste||ar
<Reazul>
Hello. :) As suggested previously I started by writing a chain of tasks in shared memory. I am not being able to use the future.then() correctly, can you please suggest the right way of doing that. https://pastebin.com/03BhnndN
<Reazul>
ok, Honestly I have gone though the examples, not fibo but hello_world, reduction, ag etc. Since I am having a hard time understanding the syntax I am trying to replicate the examples and alter them.
<heller>
yes
<Reazul>
It would be super helpful if there was examples of how to acieve simple things like create a task, wait for a task, create links between tasks etc
<heller>
hello_world and ag are really bad examples
<heller>
I agree
<Reazul>
I apologize for bothering you all so much :), I am just trying to learn :)
<zao>
I like the huge wall of instantiations followed by "there's const and stuff"
EverYoung has joined #ste||ar
mars0000 has quit [Quit: mars0000]
<zao>
/home/zao/slask/stellar/hpx/tests/unit/component/copy_component.cpp:159:1: fatal error: error writing to /tmp/ccTgRLNU.s: No space left on device
<zao>
That's it for today, I'm going home :D
<zao>
I guess that's what I deserve for running the OS and HPX build off a 74G HDD.
zbyerly_ has joined #ste||ar
<github>
[hpx] hkaiser created fixing_any_warning (+1 new commit): https://git.io/v744w
<github>
hpx/fixing_any_warning c55df05 Hartmut Kaiser: Circumvent scary warning about placement new creating object of larger size than space is available
<hkaiser>
zao: ^^
<zao>
Yay.
<zao>
I need to rebuild this machine, system disk is too small to hold HPX :)
mcopik has joined #ste||ar
akheir has joined #ste||ar
aserio has joined #ste||ar
hkaiser has quit [Quit: bye]
<patg[[w]]>
pree__: wash I can attend just ping me
<patg[[w]]>
parsa[w]: yt??
<patg[[w]]>
parsa[w]: ping
<pree__>
parsa[w] , wash : in #ste||ar-gsoc channel right in 15 minutes
<pree__>
Thanks
<patg[[w]]>
8 minutes
<pree__>
sorry
<pree__>
in 8 minutes
hkaiser has joined #ste||ar
pree__ has quit [Ping timeout: 240 seconds]
<wash>
pree__?
<hkaiser>
aserio: yt?
<wash>
I'm in that channel, looks like he disconned...
<aserio>
hkaiser: yes
<hkaiser>
see pm, pls
pree_ has joined #ste||ar
pree_ has quit [Ping timeout: 246 seconds]
pree_ has joined #ste||ar
mars0000 has joined #ste||ar
bikineev has quit [Remote host closed the connection]
hkaiser has quit [Read error: Connection reset by peer]
hkaiser has joined #ste||ar
aserio has quit [Ping timeout: 246 seconds]
eschnett has quit [Quit: eschnett]
aserio has joined #ste||ar
<Reazul>
Is there any example showing how I can create data, distribute it in HPX? I have this program that seems to work in the way I want https://pastebin.com/jELqZWQX . I tried it with openmpi on 6 nodes and it behaves correctly. Now I want to allocate data and distribute it.
<wash>
Reazul what do you mean by tried it with openmpi?
<wash>
Reazul: there's a few ways to distribute data..
<wash>
Reazul: you can use something like partitioned_vector, or hpx::new_. Or you can write your own component types and use them to build distributed entities
<wash>
hkaiser: is there a good partitioned_vector example he can look at?
<Reazul>
@wash: I meant I compiled HPX to use MPI and then tried the example I posted on 6 nodes.
bikineev has joined #ste||ar
aserio has quit [Quit: aserio]
<wash>
Reazul: just making sure :).
<Reazul>
:D
bikineev has quit [Ping timeout: 258 seconds]
pree_ has quit [Quit: AaBbCc]
patg[[w]] has quit [Quit: Leaving]
<github>
[hpx] hkaiser force-pushed resource_partitioner from 9b8e3e1 to 8eee17e: https://git.io/v7lfK
<github>
hpx/resource_partitioner 8eee17e Hartmut Kaiser: Adding pool specific performance counters...
<hkaiser>
Reazul: do you have your local example working now?
<github>
[hpx] hkaiser force-pushed fixing_any_warning from c55df05 to c8d310e: https://git.io/v74xk
<github>
hpx/fixing_any_warning c8d310e Hartmut Kaiser: Circumvent scary warning about placement new creating object of larger size than space is available
eschnett has joined #ste||ar
<jbjnr>
hkaiser: noted #2789 and commented.
<hkaiser>
I don't see any comments saying 'WIP: FIX THIS PROPERLY'
<hkaiser>
jbjnr: ^
<Reazul>
@hkaiser: Yes, I have a working version of chains with each task mapped in different node working now.
<hkaiser>
ok, so now simply apply a couple of changes: shared_ptr<T> --> hpx::id_type, make_shared --> hpx::new_, derive your C++ object from hpx::components::component, and turn the member functions of this object which you want to call remotely into actions
<hkaiser>
Reazul: IOW, the C++ object which you want to access remotely needs to be turned into a hpx 'component'
<hkaiser>
that allows for instances of this C++ type to be instantiated remotely and you can call member functions of this object remotely as well
<jbjnr>
I can have a look. from my point of view, there ought not to be any great rush to merge this. A few more days won't hurt
<hkaiser>
jbjnr: I agree
<hkaiser>
just wanted to move this forward
aserio has quit [Quit: aserio]
<hkaiser>
jbjnr: the docs need more work as well
<github>
[hpx] hkaiser force-pushed resource_partitioner from 8eee17e to a6ed025: https://git.io/v7lfK
<github>
hpx/resource_partitioner a6ed025 Hartmut Kaiser: Adding pool specific performance counters...
<hkaiser>
(perf-counter docs)
<jbjnr>
if you have time, feel free to fix the WIP, if not, I will do so. Yes, docs. correct. need to do more there
<jbjnr>
and the PP stuff. :(
<jbjnr>
docs are falling behind
<hkaiser>
yah, that as well
<Reazul>
@hkaiser: are there any example showing how to create data and move them around node boundary using action and componenets? (other than ag)
<hkaiser>
Reazul: why do you want to move data around?
<Reazul>
Well I mean, HPX will do that for me but I need to express that, right>
<Reazul>
?*
<hkaiser>
why ?
<hkaiser>
you want to avoid moving data as much as possible, right?
<Reazul>
That is the program I am trying to come up with
<Reazul>
of course.
<hkaiser>
hpx has no explicit data movement API, you 'move' data by passing it as an argument to an action invocation or you get it back when returned from such
<hkaiser>
you can think of an action as a form of a remote procedure call
<Reazul>
actions are also representing tasks right?
<hkaiser>
an action represents a function you can call remotely
<hkaiser>
if you invoke an action usually a task is created
<Reazul>
ok
bikineev has joined #ste||ar
<Reazul>
Let me rephrase my query, is there any example (simple) for distributed memory other than ag?
<hkaiser>
Reazul: what do you want to do?
<hkaiser>
all examples use distributed memory
<Reazul>
I am trying to assess HPX
<Reazul>
with simple benchmark
<Reazul>
I am trying to make sure I am executing what I plan to
<hkaiser>
I think you're trying to reproduce something resembling an MPI application
<hkaiser>
I'd suggest you forget for a moment that your application should run in distributed
<Reazul>
ok
<hkaiser>
an hpx application is very similar to a non-distributed application, no special 'data movement'
<Reazul>
right
<hkaiser>
the only things which are different compared to a non-distributed application are a couple of things caused by the limits of the C++ memory model requiring things like components and actions
bikineev has quit [Ping timeout: 240 seconds]
<hkaiser>
Reazul: an action is the same as a normal function except that you can invoke it remotely
<hkaiser>
a component is the same as a normal c++ object except that you can instantiate it remotely
<Reazul>
ok
<hkaiser>
an hpx::id_type is the same as a void* referring to something ina virtual gloabl address space
<hkaiser>
you create an instance of a component using hpx::new_<> and you invoke an action using hpx::async
<hkaiser>
both give you a future<> representing the result of the possibly remote operation
<hkaiser>
so in the end your distributed program looks like a local one
<hkaiser>
MPI forces you to write explicit code to send the data and to receive it (i.e. code on both ends of the wire)
<Reazul>
right
<hkaiser>
hpx does this by invoking an action, data is 'passed' as the argument of that action - the receiver does not explicitly 'wait' for this
<Reazul>
right, makes sense
<hkaiser>
so it's the same as calling a function, the function 'receives' the data whenever it is invoked
<Reazul>
I see
<hkaiser>
Reazul: what type of applications do you plan to write?
<Reazul>
I stated with embarrassingly parallel tasks for shared memory and moving to distributed with chains of task and ultimately come up with stencil as gereal benchmark
<Reazul>
Should give some idea about the implementation efficiency
<hkaiser>
ok
<hkaiser>
with hpx the distributed code will almost look like local code
<hkaiser>
if properly done, that is
<Reazul>
yes
<Reazul>
I will make sure by pasting the code here.
<Reazul>
you guys have been very helpful. Thank you for that :)
<hkaiser>
Reazul: let us know if we can help in any way
<Reazul>
absolutely
zbyerly_ has quit [Quit: Leaving]
<github>
[hpx] hkaiser force-pushed resource_partitioner from a6ed025 to 39d8aee: https://git.io/v7lfK
<github>
hpx/resource_partitioner 39d8aee Hartmut Kaiser: Adding pool specific performance counters...