aserio changed the topic of #ste||ar to: STE||AR: Systems Technology, Emergent Parallelism, and Algorithm Research | stellar.cct.lsu.edu | HPX: A cure for performance impaired parallel applications | github.com/STEllAR-GROUP/hpx | Buildbot: http://rostam.cct.lsu.edu/ | Log: http://irclog.cct.lsu.edu/
diehlpk has quit [Ping timeout: 252 seconds]
diehlpk has joined #ste||ar
EverYoung has joined #ste||ar
diehlpk has quit [Ping timeout: 260 seconds]
EverYoung has quit [Ping timeout: 245 seconds]
CaptainRubik has quit [Ping timeout: 256 seconds]
hkaiser has quit [Quit: bye]
K-ballo has quit [Quit: K-ballo]
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 245 seconds]
eschnett has quit [Quit: eschnett]
eschnett has joined #ste||ar
Anushi1998 has quit [Read error: Connection reset by peer]
anushi has joined #ste||ar
anushi has quit [Remote host closed the connection]
Anushi1998 has joined #ste||ar
anushi has joined #ste||ar
Anushi1998 has quit [Read error: Connection reset by peer]
Anushi1998 has joined #ste||ar
anushi has quit [Ping timeout: 276 seconds]
nanashi55 has quit [Ping timeout: 260 seconds]
nanashi55 has joined #ste||ar
<Anushi1998> Does anyone know what is the IRC handle of Taeguk Kwon?
<Anushi1998> I remember one person with kwon in channel but I haven't seen him from past 10hrs around
parsa has quit [Quit: Zzzzzzzzzzzz]
EverYoung has joined #ste||ar
nikunj has joined #ste||ar
EverYoung has quit [Ping timeout: 265 seconds]
nikunj has quit [Ping timeout: 260 seconds]
Nikunj_ has joined #ste||ar
<zao> Anushi1998: It was simply 'taeguk'.
<Anushi1998> zao: thanks :)
<zao> No failures overnight, how boring :(
<jbjnr> zao: what is it you're looking for? strangeness in hello world or something - I am aware that you've been soak testing, but I'm not quite sure which bug you're looking for
<jbjnr> maybe I can help
Anushi1998 has quit [Ping timeout: 276 seconds]
Anushi1998 has joined #ste||ar
<zao> jbjnr: It's the bug that seems to hang the hello_world example sanity check on Appveyor.
<zao> jbjnr: The failing configuration I'm concentrating on is the TCP parcelport with two localities, two threads per locality.
<zao> Something ends up causing the second process to have what seems to be an access violation, which sometimes gets the first process to enter shutdown trying to operate on a busy mutex.
<zao> (ran into some other known bugs while figuring it out, so even more fun)
<zao> Right now I'm trying to figure out what differs between a successful run and a bogus one.
<zao> I've been doing my best to avoid reading HPX source and understanding how HPX works ;)
Anushi1998 has quit [Ping timeout: 265 seconds]
Smasher has quit [Remote host closed the connection]
Smasher has joined #ste||ar
<zao> Do I need both --hpx:hpx and --hpx:agas ?
Anushi1998 has joined #ste||ar
<zao> hrm, node and agas are mutually exclusive somehow.
<zao> Bah, still seems to try to use localhost, guess I need to drop the "node" thing.
<github> [hpx] StellarBot pushed 1 new commit to gh-pages: https://git.io/vAHI4
<github> hpx/gh-pages 5bbdccd StellarBot: Updating docs
Anushi1998 has quit [Remote host closed the connection]
Anushi1998 has joined #ste||ar
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 245 seconds]
<zao> I also ran into some mysterious problem on Linux regarding the same example livelocking, but don't know if HK figured that one out.
<zao> This is interesting... got a livelock on Windows now for over an hour.
K-ballo has joined #ste||ar
Zwei_ has quit [Ping timeout: 265 seconds]
Zwei has joined #ste||ar
<jbjnr> zao: Sounds like thre is still a problem deep in the heart of hpx whre we'll never find it. if you find anything interested I'll try to replicate it on my setup. I worry about the random red failing tests on the dashboard that never quite go away.
<jbjnr> I'm going out now. leter ...
hkaiser has joined #ste||ar
parsa has joined #ste||ar
hkaiser has quit [Quit: bye]
EverYoung has joined #ste||ar
EverYoung has quit [Ping timeout: 276 seconds]
Smasher has quit [Remote host closed the connection]
eschnett has quit [Quit: eschnett]
hkaiser[m] has joined #ste||ar
hkaiser has joined #ste||ar
<hkaiser> zao: here are some explanation for the two-node setup: https://stackoverflow.com/questions/35367816/hpx-minimal-two-node-example-set-up
vss has joined #ste||ar
vss has quit [Remote host closed the connection]
hkaiser has quit [Quit: bye]
eschnett has joined #ste||ar
<zao> I ended up with --hpx:agas and --hpx:hpx, and --hpx:worker on the non-primary one.
hkaiser[m] has quit [Ping timeout: 276 seconds]
<zao> Not sure if I need to specify -hpx:console too.
<zao> No explosions yet, but running with hpx logs now for a few thousand spins.
<zao> Traffic seems to not go out on the network still sadly, assumedly need two machines for that to happen even if I explicitly bind to 10.0.1.24
mcopik has joined #ste||ar
<zao> Interesting... most log files are 1450 lines, but got one for locality 1 that's 324k lines.
<zao> Some runs take a lot of time.
<zao> Does this log make any sense to anyone?
<zao> (excerpt from the recurring tail bit)
<zao> Added locality 0 and the beginning of locality 1.
<zao> Occasionally locality 1 logs a lot more, 1-130 megabytes, compared to the 450k of a good run.
<zao> Off to travel home again.
Anushi1998 has quit [Remote host closed the connection]
Anushi1998 has joined #ste||ar
<Zwei> In the first example here: http://en.cppreference.com/w/cpp/algorithm/execution_policy_tag_t , the comment says "Error: data race". Is this because possibly two threads are trying to push to vector v? And a way to fix this would be to use a mutex in the lambda function?
<Zwei> (sorry for noob question, still very new to concurrency)
smrk007 has joined #ste||ar
<K-ballo> sounds right
<Zwei> K-ballo: thanks again! :)
<Zwei> I'm onto chapter 6 now, lock-based data structures. After I finish this book, I want to try fixing some bugs in HPX, or do you think it's still too early for me?
<Zwei> Or what do you recommend I do afterwards to up my multithreaded programming skills after this book?
<K-ballo> try doing some stuff with it
<Zwei> Yes sire!
mbremer has quit [Ping timeout: 260 seconds]
<smrk007> Hello, I am interested in contributing to HPX and learning more about parallel computing. What are open areas that most need contributions?
<Zwei> smrk007: I'm in the same boat as you! But I'm very new to multithreading, so I'm just studying first :)
<smrk007> Zwei: Nice! What sorts of things have you been studying, and have you been looking into any area of contribution in particular?
<Zwei> smrk007: Working through C++ Concurrency In Action, by anothony williams.
<Zwei> I've had previous experience with MPI, but nothing multithreaded.
<Zwei> This seems to be a lot harder, with all the synchronizing, avoiding deadlocks/race conditions, etc...
<Zwei> Not looked at areas to contribute yet.
hkaiser[m] has joined #ste||ar
<Zwei> smrk007: After this book, I plan to read "The art of multiprocessor programming", most of the examples in there are java based though, idk how useful it'll be for C++... but I think I'll gain a better understanding of multithreading from it.
<smrk007> Zwei: I see. Yeah, I'm basically a beginner in these areas as well. I'll check out that book though. While I don't really know the general directions of HPX, I've been looking through some of the 'issues' on the HPX github to see if there's anything of interest.
<Zwei> Back to writing my parallel queue. Cya around :)
<smrk007> See ya
<Zwei> smrk007: I'd say it's definitely worthwhile to go through C++ Concurrency in Action. It makes you think kind of differently. For example, it's been drilled into me that for stacks, say, you have different functions for pop() and top().
<Zwei> Herb Sutter demonstrated in Exceptional C++, that this is done for exception safety.
<Zwei> But in C++ Concurrency in Action, it says you have to keep the two together, to prevent data race.
<Zwei> or deadlocks
ASamir has joined #ste||ar
<smrk007> Zwei: Okay, I'll definitely do that. Also, if you're looking for things to do that might be interesting, I found that an ongoing project is implementing some of the standard c++ algorithms so that they adhere to the c++17 type specifications, which can be found here: https://github.com/STEllAR-GROUP/hpx/projects/1 I don't know how much one needs to know to get into this area, but since you definitely know more than 0 you should ma
<zao> Zwei: Sometimes design tradeoffs are made to help make an efficient implementation, like allowing no peeking on elements, instead requiring that someone interested in the head of the queue simply pops it and assumes ownership.
<zao> If one uses a queue for say work distribution, efficient ownership-transferring pops may be way more important than the good old top/pop which keeps the element in the queue for no real point.
<zao> (quite helped by that it's way easier to atomically
<zao> juggle pointers and other machineword-sized things around than full-blown values)
<Zwei> zao: I see, thanks!
<zao> One can get far with a bunch of mutexes, condition variables and boring old containers too :)
<zao> (deadlocks can be quite mitigated by having a lock hierarchy so you don't lock things in "the wrong order")
<zao> If you've got data in a thread queue, you probably don't have it need any synchronization to copy/pop.
<zao> (train time!
<Zwei> Question, when the C++ concurrency TS is in the standard, would that deprecate some of the stuff in HPX?
<Zwei> smrk007: thanks for the link!
hkaiser[m] has quit [Remote host closed the connection]
hkaiser[m] has joined #ste||ar
ASamir has quit [Ping timeout: 260 seconds]
smrk007 has quit [Ping timeout: 260 seconds]
<hkaiser[m]> Zwei not necessarily
<hkaiser[m]> actually depends on what gets adopted
<hkaiser[m]> but I'm sure we would adapt our API's
<Zwei> hkaiser[m]: Ah, that's nice to know :)
mcopik has quit [Ping timeout: 240 seconds]
<zao> hkaiser[m]: I'm getting rather weird logs from --hpx:debug-hpx-log, they vary greatly in size. A normal run has like 450 KiB log in each locality, while occasionally the second locality bloats to 1-130 megabytes, mostly repeating the same bits over and over again.
<zao> I suspect it's not quite all right :)
<hkaiser[m]> zao that could be just fine
<hkaiser[m]> hello_world is completely nondeterministic
<zao> I'm still quite curious to what the processes that have been running for three hours are up to.
<hkaiser[m]> it reschedules things until the thread happens to run on the desired core
<zao> Ooh, that's nasty :)
<zao> That gave me an idea. I should restrict the processor affinity in Windows to try to starve it out.
<zao> That incidentally explains to me how you can run things on particular threads.
<hkaiser[m]> you can give hpx a hint where to run things but hello_world does not do that afair
<zao> Would've probably helped if I read the code I'm running in the first place instead of black-boxing it..
hkaiser[[m]] has joined #ste||ar
daissgr has joined #ste||ar
hkaiser[m] has quit [Ping timeout: 265 seconds]
hkaiser[[m]] has quit [Ping timeout: 245 seconds]
hkaiser[[m]] has joined #ste||ar
mcopik has joined #ste||ar
hkaiser has joined #ste||ar
<github> [hpx] hkaiser closed pull request #3206: Addition of new arithmetic performance counter "Count" (master...count_arithmetic_performance_counter) https://git.io/vA1lu
<github> [hpx] hkaiser closed pull request #3209: Fix locking problems during shutdown (master...fix-shutdown-locks) https://git.io/vAMpJ
<github> [hpx] hkaiser deleted fixing_shutdown_vs2015 at 9e40843: https://git.io/vAHwg
K-ballo has quit [Read error: Connection reset by peer]
K-ballo1 has joined #ste||ar
K-ballo1 is now known as K-ballo
hkaiser has quit [Quit: bye]
<Nikunj_> @heller_: Please check if these were the changes you told me to do the other day.
<hkaiser[[m]]> nikunj_ why have you removed all of the code?
<Nikunj_> @hkaiser[[m]]: yesterday, @heller_ told me to build hello_world_component using the unit.build script instead of the test code ( which was previously being built )
<hkaiser[[m]]> what is your script building NOW?
<hkaiser[[m]]> (sorry for the caps)
<Nikunj_> @hkaiser[[m]]: The script is now building the hello_world_component instead of the earlier test code that was being built
<Nikunj_> It is also checking for pkg-config so there is no need for checking if the system is linux based
<hkaiser[[m]]> but you have removed this component, yes?
<Nikunj_> no I have removed only the tests/unit/build/src folder
<Nikunj_> folder^^
<Nikunj_> this was the place where the previous test code was present
<hkaiser[[m]]> I don't get it, this was supposed to be built as a standalone example
<hkaiser[[m]]> heller_?
<hkaiser[[m]]> all of this looks just wrong to me
hkaiser[[m]] has quit [Ping timeout: 240 seconds]
hkaiser[[m]] has joined #ste||ar
hkaiser[[m]] has quit [Ping timeout: 245 seconds]
daissgr has quit [Ping timeout: 245 seconds]
Nikunj_ has quit [Quit: Page closed]
diehlpk has joined #ste||ar
jakub_golinowski has quit [Remote host closed the connection]
K-ballo has quit [Remote host closed the connection]
hkaiser[[m]] has joined #ste||ar
hkaiser has joined #ste||ar
K-ballo has joined #ste||ar
hkaiser_ has joined #ste||ar
hkaiser has quit [Ping timeout: 256 seconds]
hkaiser[[m]] has quit [Ping timeout: 256 seconds]
diehlpk has quit [Ping timeout: 256 seconds]
hkaiser[[m]] has joined #ste||ar
hkaiser_ has quit [Read error: Connection timed out]