k-ballo[m]: debug size down due to the iostreams headers, or due to some changes you have in progress? or just compared to last time you checked?
pedro_barbosa: the device vectors that you're capturing by reference do point to memory that's allocated on the gpu, but the vector instance itself is still allocated on the host and using a reference to a host variable on the gpu is not allowed
in general I wouldn't recommend implementing matrix-matrix multiplication like that on the gpu because we don't really have the facilities to implement that efficiently but if you just want to try something out you'd need to capture iterators to the device vector by value
ms[m]: do you know how could i retrieve the build type from my build dir?
gonidelis[m]: grep 'CMAKE_BUILD_TYPE' <build_dir>/CMakeCache.txt? or ccmake <build_dir>?
I have a question about practical usage of hpx in stencil code
Say that I have a 9 point stencil and the operation is a sum of the values in the points. What is the order of magnitude for the optimal "grain"/partition size on a typical 128 core cluster node?
In the examples the partitions (=stripes) are created such that only one of the spatial dimensions in the box-like domain is split which makes handling the boundaries quite easy. However, a more general approach would be to allow the partitions to have a box like shape i.e. boundary exchange happens in all directions.
How are the partitions created in, for instance, octo-tiger or does it even use the partition type approach?
I guess what I'm asking is that has anyone experienced any problems in assuming a one-dimensional splitting for a n-dimensional domain when the stencil width increases?
ms[m]: you have any build flags to suggest for performance analysis runs?
ms[m]: thanks btw for the built_type one :)
hkaiser: could you please review #5254
srinivasyadav227: ok
hkaiser: do we have any specific file format for GSoC proposal template?