hkaiser changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/ | GSoC: https://github.com/STEllAR-GROUP/hpx/wiki/Google-Summer-of-Code-%28GSoC%29-2020
richard[m]1 has quit [Ping timeout: 246 seconds]
hkaiser has joined #ste||ar
kordejong has quit [Ping timeout: 240 seconds]
kordejong has joined #ste||ar
ms[m] has quit [Ping timeout: 246 seconds]
ms[m] has joined #ste||ar
richard[m]1 has joined #ste||ar
weilewei has quit [Remote host closed the connection]
hkaiser has quit [Quit: bye]
Yorlik has quit [Ping timeout: 260 seconds]
Yorlik has joined #ste||ar
Yorlik has quit [Ping timeout: 272 seconds]
nan111 has quit [Remote host closed the connection]
bita_ has quit [Ping timeout: 260 seconds]
jaafar has quit [Quit: Konversation terminated!]
jaafar has joined #ste||ar
<kordejong> Does anybody know what could be wrong in my code when this assertion fires: `Assertion 'split_gids_.empty()' failed: HPX(assertion_failure)` The stack trace refers to `hpx/runtime/serialization/detail/preprocess_gid_types.hpp`, line 47. I am using HPX 1.4.1.
mcopik has joined #ste||ar
mcopik has quit [Quit: Leaving]
<ms[m]> heller: any idea? ^
<heller1> Kor de Jong: ui, that shouldn't happen
<heller1> Kor de Jong: do you happen to have a backtrace?
<heller1> Kor de Jong: a full one that is
<heller1> hmm, that looks wrong
<kordejong> This is with a run on a single cluster node, using a locality per numa node (8). When using a single locality on one numa node it works fine. I have tried lots of things, but could use some direction as to what to try next.
<heller1> yeah, this has to be a multi locality run
<heller1> so it is related to some call to `hpx::async`, where one of the arguments is either a client, or a id_type directly
<heller1> can you run with `--hpx:attach-debugger=exception`?
<kordejong> Indeed, I send component clients to other localities
<heller1> then you can attach to the exception once this is happening
<kordejong> OK, I will try that. I will be in a meeting in a few minutes. Will pick up after that and report the results. Thanks so far.
<kordejong> <heller1 "then you can attach to the excep"> I see this message on the compute node: `PID: 3009 on gpu007.cluster ready for attaching debugger. Once attached set i = 1 and continue` Attaching to the process using `gdb -p 3009` works. What does `set i = 1` mean? gdb does not understand that syntax. Continuing anyway seems to hang. The cores on one of the numa nodes seem to idle. The other ones are busy. I am missing
<kordejong> something.
<heller1> yes
<heller1> once you attached
<heller1> gtg
<ms[m]> Kor de Jong: you might have to change to the thread that threw the exception first
<ms[m]> basically `--hpx:attach-debugger=exception` starts a busy loop when an exception is thrown that can be exited by changing the value of that i variable
<ms[m]> once you're on the right thread a `bt` might be useful since the stacktrace above doesn't show anything above the preprocess_gid_types destructor
<ms[m]> I most likely won't be able to help you with the actual problem though since I know little about that part of the codebase...
<gonidelis[m]> I 've being trying to create a proper tester for my `is_sentinel_for` https://gist.github.com/gonidelis/34c4e0ce645d4e5564da3076da74ed3f . In order to find a proper pair to compare I used the `iter_sent.hpp` with this way: https://gist.github.com/gonidelis/2c0cda6e01df028ad25c530732c37236 . So I get `error: temporary of non-literal type ‘Iterator<long int>’ in a constant expression` . So is there any way to actually
<gonidelis[m]> `iter_sent.hpp` ? Any help on how could I test the trait with a vector for example ?
<gonidelis[m]> K-ballo: hkaiser
neill[m] has joined #ste||ar
<gonidelis[m]> well... just removed the initializations (`{0}` `{100}`) and the test both compiles and passes :shrug:
<gonidelis[m]> Do you think that's correct?
<gonidelis[m]> + i still would like to put some other test cases
hkaiser has joined #ste||ar
<ms[m]> hkaiser: heller jbjnr rori_[m] sorry, pycicle is still not quite happy... my folder cleanup isn't working and I keep going over quota on scratch
<ms[m]> I just need some time to monitor it and figure out where it's failing
<hkaiser> ms[m]: take your time
<ms[m]> it's just ci... ;)
<hkaiser> ms[m]: you're trying to go too fast - I know you're impatient, but hey - nothing is urging you to go at THAT speed
<ms[m]> just myself...
<hkaiser> yah
<hkaiser> understood
rtohid has joined #ste||ar
<hkaiser> ms[m]: I'll fix #4678 today, could we merge that soon? playing catch-up there for a while now
<hkaiser> it's fairly small, so shouldn't break any of your ongoing things
<hkaiser> same for #4693
<ms[m]> hkaiser: yep, no problem
<ms[m]> I just wanted to get the renamings in, things should be easier for a while again now
<hkaiser> good, thanks for working on this!
<ms[m]> thanks for not chasing us out of the room!
<rori> yes thanks for merging the renaming things !
<hkaiser> also, could we merge the naming fixes asap?
<hkaiser> #4710
<hkaiser> I'd need this to get Phylanx in order - it's broken right now
<ms[m]> I just wanted to ask about that
<ms[m]> or do we go with compatibility headers that warn for now and we undeprecate them later?
<hkaiser> fine with me, there might be one or two more headers, but we can add those later
<hkaiser> hpx/filesystem.hpp for instance
<ms[m]> ok, good
<ms[m]> I see there's an inspect failure, let me just clear that up and then we can merge it
<hkaiser> k
<hkaiser> thanks
<rori> thanks for doing this ms!
<gonidelis[m]> hkaiser: Did you have any time to check my gists?
<hkaiser> gonidelis[m]: sorry, didn't see them, could you repost the links, pls?
<gonidelis[m]> sure
<gonidelis[m]> I 've being trying to create a proper tester for my `is_sentinel_for` https://gist.github.com/gonidelis/34c4e0ce645d4e5564da3076da74ed3f . In order to find a proper pair to compare I used the `iter_sent.hpp` with this way: https://gist.github.com/gonidelis/2c0cda6e01df028ad25c530732c37236 . So I get `error: temporary of non-literal type ‘Iterator<long int>’ in a constant expression` . So is there any way to actually
<gonidelis[m]> `iter_sent.hpp` ? Any help on how could I test the trait with a vector for example ?
<gonidelis[m]> well... just removed the initializations ({0} {100}) and the test both compiles and passes :shrug:
<gonidelis[m]> i still would like to put some other test cases
<gonidelis[m]> Do you think that's correct?
<hkaiser> gonidelis[m]: you need to test for both, T == U and U == T (same for !=)
<gonidelis[m]> hkaiser: So I solved the error but I don't know if that's the proper way to do so...
<hkaiser> your test must look like: static_assert(is_sentinel_for<Iterator<std::int64_t>, Sentinel<int64_t>>::value, "...")
<gonidelis[m]> hkaiser: oh ok... but I can only return one ` type =
<gonidelis[m]> decltype(std::declval<const T&>() != std::declval<const U&>());`
<hkaiser> you need to use the types, not instances of types as template arguments
<gonidelis[m]> Yeah, as I said I removed the instantiations... So the only thing missing now is `static_assert`
<gonidelis[m]> I reckon `HPX_TEST_MSG` is not using `static_assert` already....?
<gonidelis[m]> ahhhh...... :@
<hkaiser> gonidelis[m]: well, HPX_TEST_MSG is a runtime check, static_assert() is compile-time, but sure, both would work
<gonidelis[m]> hkaiser: Some things are so simple and trivial and I just keep asking questions about them... sory
<hkaiser> no worries
<ms[m]> hkaiser: just pushed to 4710, will merge it now
<hkaiser> gonidelis[m]: just be careful to a) either use other names for the equality_result etc. or b) to make sure to use the existing ones
<ms[m]> if I've missed something I'll fix separately (i.e. inspect or something like that, but I hope I got it all)
<hkaiser> existing templates, I meant
<hkaiser> gonidelis[m]: one minor thing
<gonidelis[m]> Is defining `equality_result` under `detail` extra unecessary work? I mean I copied pasted them from the `is_iterator` trait....
<gonidelis[m]> ?
<hkaiser> gonidelis[m]: we converge on using `east const` notation, i.e. we write `T const&` instead of 'const T&` - purely cosmetic change but more consistent
<hkaiser> gonidelis[m]: copying them will create duplicate definitions if both headers are included, not?
<gonidelis[m]> hkaiser: yup... actually I just encountered that problem like 5 minutes ago
<hkaiser> gonidelis[m]: best might be to move the definitions to a separate header and #include that from both, is_iterator.hpp and is_sentinel_for.hpp
<gonidelis[m]> hkaiser: oh ok... I will. Sth like `results.hpp` would be ok?? (sory I am not good with namings yet...)
<hkaiser> gonidelis[m]: hmmm
<hkaiser> let's call it hpx/iterator_support/traits/detail/concept_helpers.hpp or something like that
<hkaiser> there will be more, I'm sure
<hkaiser> gonidelis[m]: alternatively just #include is_iterator.hpp in your is_sentinel_for.hpp
<hkaiser> might be the easiest solution
<hkaiser> anybody #including is_sentinel_for will need is_iterator anyways
<gonidelis[m]> hkaiser: yeah that's what I thought... might be a little bad for the icludes/file formatting. Anyways... It's true those two are closely binded and I reckon `#pragma once` will do the magic and compile won't notice...
nan111 has joined #ste||ar
<hkaiser> yes
<hkaiser> ms[m]: thanks
<hkaiser> I'll add more if needed there
<hkaiser> gonidelis[m]: do we meet today?
<hkaiser> same time?
<gonidelis[m]> hkaiser: of course
<hkaiser> k
<gonidelis[m]> hkaiser: as for the trait test cases you can see I just used the pair defined in `iter_sent.hpp`
<gonidelis[m]> What other pairs should I test??
Yorlik has joined #ste||ar
<gonidelis[m]> Or better: how could I find what other pairs need to be tested? On the TS maybe??
<K-ballo> as I suggested the other day, test iterator pairs
<K-ballo> also, make at least one test that it correctly rejects non-sentinels
<gonidelis[m]> K-ballo: ahh you mean `being()` and `end()` right? On the rejection part: If I plug a non-sentinel in `is_sentinel_for` shouldn't the test fail then???
<K-ballo> yes, a begin/end pair.. and no, is_sentinel_for should return false and the test should check that it is so
<K-ballo> let me leave no room for confusion there.. is_sentinel_for for begin()/end() pair should return true, and is_sentinel_for say... a string and an iterator should return false
<gonidelis[m]> ahh ok ... sory... I just need to test `::value == false` for the second case... you are right. Thanks a lot
<gonidelis[m]> So no other pairs?
<K-ballo> those three cases should cover all the "interesting cases", add any other you think is useful
weilewei has joined #ste||ar
<nan111> STEllAR-GROUP/hpx#4710 is merged, so I updated the latest hpx. The old error gone but I got a new error, which says "/home/nanmiao/Documents/project/dev/src/phylanx/src/execution_tree/meta_annotation.cpp:16:10: fatal error: hpx/collectives.hpp: No such file or directory #include <hpx/collectives.hpp>"
<ms[m]> hkaiser: nan111 that one was added after 1.4.1
<hkaiser> nan111: we still need to fix phylanx for the recent changes, I'll do that asap
<ms[m]> I wouldn't want to add a compatibility header for that...
<hkaiser> no need
<nan111> Got it. Thanks!
<hkaiser> in general for the missing headers, change hpx/foo.hpp to hpx/modules/foo.hpp for now
<hkaiser> I have a meeting now, will look into fixing it afterwards
<hkaiser> ms[m]: thanks for merging this
<ms[m]> hkaiser: thanks
<gonidelis[m]> K-ballo: t
<gonidelis[m]> Thanks a lot^^
<hkaiser> rori, gonidelis[m], heller1: https://lsu.zoom.us/j/92781473639
<ms[m]> this makes no sense... I find no traces of hpx/collectives.hpp either in 1.4.1 or on master (before the renaming)
<hkaiser> ms[m]: it's auto-generated
<ms[m]> even then
<ms[m]> oh... it's on by default
<ms[m]> then we did have that in 1.4.1
<ms[m]> we'll need a compatibility header for that as well then
<ms[m]> however, assuming you're going to do some renaming anyway I won't add it right now
<weilewei> hkaiser ms[m] jbjnr btw, Hazard pointer related unit and stress tests passed with HPX support... next step is making sure dynamic hazard pointer related tests pass as well...
<ms[m]> weilewei: very nice!
<weilewei> and then other lock-free stuff...
<weilewei> :)
<ms[m]> so this is with hpx threads, right, not just hpx os threads?
<weilewei> it uses hpx::thread, are hpx threads and hpx os threads different things?
<ms[m]> yep, different things, and if it works with hpx::thread that's very good (that would've been the trickier one, but it looks like it's not a problem)
<ms[m]> hpx os thread = hpx worker thread by another name
<weilewei> ok, then I should be good, it uses hpx::this_thread::get_id() thing. It is tricky as my implementation fails at a thread counter variable, causing me deadlock. This thread counter was not atomic protected at all, but I make a PR to libcds team. Let's see what they say
<weilewei> but now it is fixed after making the counter atomic
<diehlpk_work> hkaiser, SC paper meeting?
<hkaiser> diehlpk_work: sorry, I'm in a gsoc meeting
bita_ has joined #ste||ar
<hkaiser> weilewei: I'll be a bit late for our meeting
<weilewei> hkaiser np
<hkaiser> weilewei: now?
rtohid has left #ste||ar [#ste||ar]
<ms[m]> woop, cmake configuration in over an hour! scratch is setting new records...
<zao> Intel's great at that for me otherwise.
<K-ballo> does that mean >1h to run some project's cmake configuration(+generation?) step?
<zao> Sounds like you've got highly performant filesystems there.
<hkaiser> nan111: #1189 should be fine on top of HPX master
<nan111> hkaiser Thanks!
<weilewei> is it correct to create a thread pool simply using std::vector< hpx::thread > threads; ?
<weilewei> I am seeing this error: /home/weile/project/dev/src/hpx/libs/synchronization/src/mutex.cpp:39: void hpx::lcos::local::mutex::lock(const char*, hpx::error_code&): Assertion 'threads::get_self_ptr() != nullptr' failed
<hkaiser> weilewei: well a vector<thread> doesn't do anything on its own
<hkaiser> just an empty vector, same as vector<int>
<weilewei> but when I run each thread of threads, and tries to grab a lock, it returns this error
<hkaiser> sure
<hkaiser> you can't just run hpx code on any std thread
<weilewei> hmmm, what should I do then?
<hkaiser> what are you trying to achive with this?
<hkaiser> hpx already has thread pools, why create another one?
<hkaiser> the hpx scheduler _is_ a thread pool of sorts
<weilewei> Right, so essentially, there is a vector of works to do, and then each work needs to be run on one of hpx threads
<weilewei> Just trying to maintain same syntax... because libcds uses vector <std::thread> threads;
<hkaiser> weilewei: well, then use a std::vector<hpx::thread>
<hkaiser> or simply async each work item
<weilewei> right, but then I get this Assertion 'threads::get_self_ptr() != nullptr' failed error...
<hkaiser> when do you get that?
<hkaiser> if you use a vector of HPX threads?
<weilewei> yes, a vector of hpx threads
<hkaiser> did you initialize the HPX runtime?
<hkaiser> using hpx::init or similar?
<weilewei> ah!! I forgot this! I thought I have it already... apparently not
<weilewei> Thanks
<ms[m]> K-ballo: zao exactly... it's hpx on daint's scratch filesystem, which has been hammered by some user all day
<zao> ms[m]: is that node-local? Maybe you can build in /dev/shm if there’s enough mem?
<ms[m]> zao: yeah, I think I could do that
K-ballo has quit [Quit: K-ballo]
K-ballo has joined #ste||ar
rtohid has joined #ste||ar
kale[m] has joined #ste||ar
rtohid has quit [Remote host closed the connection]
kale[m] has quit [Ping timeout: 272 seconds]
kale[m] has joined #ste||ar
rtohid has joined #ste||ar
kale[m] has quit [Ping timeout: 260 seconds]
kale[m] has joined #ste||ar
kale[m] has quit [Ping timeout: 272 seconds]
kale[m] has joined #ste||ar
weilewei has quit [Remote host closed the connection]
weilewei has joined #ste||ar
kale[m] has quit [Ping timeout: 256 seconds]
kale[m] has joined #ste||ar
karame_ has quit [Remote host closed the connection]
kale[m] has quit [Ping timeout: 260 seconds]
<Yorlik> Is it guaranteed, that the address of a component as long as it exists on a locality and doesn't migrate will never change?
<hkaiser> yes
<Yorlik> Thanks
<hkaiser> Yorlik: like any c++ object
<Yorlik> Well ...
<Yorlik> There might be situations where you want to move an object around. BTW: Are the Objects Movable?
<hkaiser> what objects?
<Yorlik> Components
<hkaiser> no they are non-copyable and non-movable
<hkaiser> why would you ever want to move one?
<Yorlik> OK. Makes sense. I could work with pointers for what I have in mind.
<Yorlik> I am thinking a lot about what is happening to my cache in the moment.
<hkaiser> if you have a shared_ptr<Foo> p = make_shared_ptr<Foo>(), you never even thing of moving the allocated Foo
<hkaiser> *thikn*
<hkaiser> think
<Yorlik> I just changed my shared ptrs into raw pointers again.
<Yorlik> Its the right use case - they are non owning and just observers
<Yorlik> Its complicated - moving the components would most likely be overkill if it were possible.
<Yorlik> I'm just thinking in several directions.
<Yorlik> The ost important feature I already have: I can move the data which belongs to the copmponents
<Yorlik> So I'm free to optimize if I see an issue.
<Yorlik> But the biggest problem I see is the cache thrashing by the many Lua states we use.
<Yorlik> And also the migration of a task using a lua state is an issue - it will leave unused cache entries behind and start thrashing the cache on arrival at the new core.
<Yorlik> A lua state uses ~1 MB of memory after all.
<Yorlik> the state itself is cheap - 56 bytes only
<Yorlik> But is uses a bunch of heap
<Yorlik> So if it migrates that puts a lot of stress on the caches
<Yorlik> hkaiser: when a parallel loop creates all these tasks to run the chunks: Are these tasks all created on the local thread and just stolen from others or are they distributed in a round robin manner?
<hkaiser> round robin, I think - at least the hint is set that way
<Yorlik> OK. That's good - since otherwise that would put a lot of stress on the local cache.
bita__ has joined #ste||ar
<Yorlik> I'm seeing more and more the limitations of Lua.
<Yorlik> But there isn't really an alternative that is better.
<Yorlik> hpx::get_ptr returns a shared_ptr/future - is there a function that directly gives me a raw pointer, when I need just an observing ptr?
bita_ has quit [Ping timeout: 246 seconds]
karame_ has joined #ste||ar
nan111 has quit [Remote host closed the connection]
nan111 has joined #ste||ar
<hkaiser> Yorlik: it returns a shared_ptr for good measure
<Yorlik> I understand you're holding my hand here. I can live with that ;)
<hkaiser> do you have proof that using a raw pointer instead of a shared_ptr gives you significant improvements?
<Yorlik> Since the shared_ptr is already created I will keep using it instead of just calling .get(), but I will have to destroy it unnecessarily, since it is used for the backlink from the data which is already owned by the component.
<Yorlik> The shared_ptr in this case is just in the way.
<Yorlik> It's not a performance problem. Its the wrong use case
<Yorlik> I need a link back to the component from the data which it owns
<hkaiser> who is creating the data?
<hkaiser> the component?
<Yorlik> So - when migrating the data I will have to explicitely reset that pointer, migrate and reset it.
<Yorlik> Yes
<Yorlik> The component creates the data
<hkaiser> why don't you pass 'this' to the data when it is being created, then?
<Yorlik> Woops? :D
* Yorlik bangs head on table
<Yorlik> I wish I could teach a raw pointer to generate a static_assert if someone tries to use delete on it.
<hkaiser> Yorlik: easy: don't use raw pointers
<Yorlik> Nope
<hkaiser> use references
<Yorlik> Actually in this specific use case a reference would do the job
<Yorlik> But I have some other situations where i really need to change a pointee a lot and reference_warpper is not as lightweight
<hkaiser> nonesense
<hkaiser> reference_wrapper is just a pointer internally
<Yorlik> If that is the case there is bunk "knowledge" out there.
<hkaiser> Yorlik: absolutely
<Yorlik> Might be worth a blog post of someone capable and blogging. "Why we totally do not need raw pointers anymore"
<Yorlik> I'm definitely not the authority to do that.
<Yorlik> I have to think if references would work as good as pointers in the Lua interop though - probably yes.
<hkaiser> Yorlik: Shawn Parent runs around for years talking about that in his presentations
<hkaiser> what else do you need?
<Yorlik> Do you have a specific talk in mingd I could look up ?
<hkaiser> for interop you can also take the address of the reference if you need a pointer
<Yorlik> True
<hkaiser> Yorlik: C++ Seasoning, Microsofts Native C++ conference 2013
<Yorlik> I might just convert - lol
* Yorlik goes watching guru lesson
<Yorlik> Thanks !
<K-ballo> Shawn
<hkaiser> karame_: right - sorry
<karame_> hkaiser Thanks! I submitted
rtohid has left #ste||ar [#ste||ar]
nan111 has quit [Remote host closed the connection]
nikunj has quit [Ping timeout: 260 seconds]
nikunj has joined #ste||ar
<jbjnr> /usr/bin/ld: cannot find -lhpx_local_async
<jbjnr> that must be a new one
<jbjnr> oh dear. an old one that needs to be removed by the looks of it
<hkaiser> jbjnr: it's async_local now
<Yorlik> hkaiser: When a component gets serialized for migration. What happens to references?
<Yorlik> like my entity holding a reference to the managing component gameobject
<Yorlik> Ofc the entity get serialized alongside the component, since they have to travel together.
<hkaiser> Yorlik: what happens to them?
<hkaiser> the serialization does not change anything, but once the migration is done the original object is deleted
<Yorlik> The current layout is such, that the gameobject is the hpx component and it has a type erased point to the entity and the entity has a raw pointer backlink to the component
<hkaiser> you will have to re-set the reference on the receiving end
<hkaiser> during de-serialization
<Yorlik> So - just assigning the gameobject component after arrivel to the reference member would be sufficient?
<Yorlik> Like repairing the link, because the addresses changed
<Yorlik> BTW: I'm running into quite some trouble using a reference: Suddenlöy my component holding the ref is no longer default constructible and that buibbles all the way through my structure
<Yorlik> So - all the bopilerplate I carefully tried to avoid suddenly seems mandatory.
<hkaiser> Yorlik: build you own reference_wrapper or similar
<hkaiser> man, it's c==, stop complaining - everything is possible
<hkaiser> c++ even
<Yorlik> Like something that would temporarily allow a nullptr?