Dirty hands coding - Comments: Vectorizing std::merge with vpermd from AVX2 and lookup tablehttps://dirtyhandscoding.github.io/posts/vectorizing-stdmerge-with-vpermd-from-avx2-and-lookup-table.html/2017-08-27T23:07:00+07:00Posted by: Morwenn2017-08-27T23:07:00+07:002017-08-27T23:07:00+07:00Morwenntag:dirtyhandscoding.github.io,2017-08-27:/posts/vectorizing-stdmerge-with-vpermd-from-avx2-and-lookup-table.html//comment-3md<p>In the end, your taken solution goes in the same way, but further, truly solves the problem. At first I was surprised that you didn’t use a binary search to find the greatest element of a collection in the other collection, but your latest post pretty much covers the …</p><p>In the end, your taken solution goes in the same way, but further, truly solves the problem. At first I was surprised that you didn’t use a binary search to find the greatest element of a collection in the other collection, but your latest post pretty much covers the linear vs binary search question.</p>
<p>I’m surprised that your branchless implementation is faster though, considering that the last time I tried to make a branchless merge, it was terribly slower than branchy version (see SO post). That said, after having read your post, it becomes clearer: the loop condition still depended on the result of the loop, so the branch wasn’t the main issue in the first place. I should try it again.</p>
<p>Anyway, if you ever write that mergesort article, I’ll be happy to read it ^^</p>
<p><a href="https://stackoverflow.com/q/41129442/1364752">https://stackoverflow.com/q/41129442/1364752</a></p>Posted by: dirtyhandscoding2017-08-27T19:38:00+07:002017-08-27T19:38:00+07:00dirtyhandscodingtag:dirtyhandscoding.github.io,2017-08-27:/posts/vectorizing-stdmerge-with-vpermd-from-avx2-and-lookup-table.html//comment-2md<p>I’m glad you liked it!</p>
<p>Yes, you can merge the first <code>min(aCnt, bCnt)</code> elements without checking where the pointers are, and then use slower <code>i < aCnt && j < bCnt</code> condition for the rest of the work. For the most likely case of equally-sized arrays, it would cope only with …</p><p>I’m glad you liked it!</p>
<p>Yes, you can merge the first <code>min(aCnt, bCnt)</code> elements without checking where the pointers are, and then use slower <code>i < aCnt && j < bCnt</code> condition for the rest of the work. For the most likely case of equally-sized arrays, it would cope only with 50% of the issue (regardless of elements order). I guess my perfectionism does not like that =) The things I suggested fix the problem almost completely, except for maybe pathologically ordered inputs.</p>
<p>Hopefully I'll implement merge sort on top of the suggested merge algorithm, but I'm afraid it won't happen soon =(</p>Posted by: Morwenn2017-08-26T19:29:00+07:002017-08-26T19:29:00+07:00Morwenntag:dirtyhandscoding.github.io,2017-08-26:/posts/vectorizing-stdmerge-with-vpermd-from-avx2-and-lookup-table.html//comment-1md<p>Hi, I found your article very interesting (and being a code-stealer, I might borrow some ideas to improve some of my algorithms). I also tried to improve a simple merge algorithm once or twice, and came up with a few tricks. Condiering how smart some of the ideas you describe …</p><p>Hi, I found your article very interesting (and being a code-stealer, I might borrow some ideas to improve some of my algorithms). I also tried to improve a simple merge algorithm once or twice, and came up with a few tricks. Condiering how smart some of the ideas you describe are, I’m surprised that you didn’t describe one the simplest tricks I use: since we know the size of the collections, we know that we will have to move at least <code>min(aCnt, bCnt)</code> elements from each of the original collections, so we can do a blind loop from <code>0</code> to <code>min(aCnt, bCnt)</code>, then the usual loop that checks for <code>i < aCnt && j < bCnt</code>. In order to do that, I actually copy pointers that I increment instead of incrementing the indices, so it's a bit different. I feel that I suck at explaining, so you could just have a look at the code at the end of the algorithm. The trick ensures that the first loop condition is trivial rather than repeatedly checking for the values of <code>i</code> and <code>j</code> that are unpredictable at every iteration.</p>
<p>Anyway, that was a great article; really interesting. Kudos! </p>
<p><a href="https://github.com/Morwenn/cpp-sort/blob/master/include/cpp-sort/detail/inplace_merge.h#L58-L97">https://github.com/Morwenn/cpp-sort/blob/master/include/cpp-sort/detail/inplace_merge.h#L58-L97</a></p>