c++ - My SSE2 Flooring function has some problems -


so wrote function using sse2 floors vector seems work purposes example works fine bi-linear filtering algorithm when used perform modulo comes values off. function works performing conversion integer vector using truncation , converts floating point. both floor , modulo code listed below:

inline __m128 floor_simd(const __m128 & a) {     __m128i int_val = _mm_cvttps_epi32(a);     return _mm_cvtepi32_ps(int_val);  }  inline __m128 mod_simd(const __m128 & x, const __m128 & y) {     return _mm_sub_ps(x, _mm_mul_ps(y, floor_simd(_mm_div_ps(x, y)))); } 

might have explanation why i'm getting odd values modulo?

edit: example when 1 uses mod_simd(_mm_set1_ps(63.6f), _mm_set1_ps(32.0f)) produce faulty answer mod_simd(_mm_set1_ps(23.6f), _mm_set1_ps(32.0f)) produce correct answer. when replace floor function less efficient component wise version works fine.

i solved own problem. benefit of here resulting code. subtracts 1 value if greater original value compensates truncation problem

inline __m128 floor_simd(const __m128 & a) {     static const __m128 1 = _mm_set1_ps(1.0f);      __m128 fval = _mm_cvtepi32_ps(_mm_cvttps_epi32(a));      return _mm_sub_ps(fval, _mm_and_ps(_mm_cmplt_ps(a, fval), one)); } 

Comments

Popular posts from this blog

java - Jmockit String final length method mocking Issue -

What is the difference between data design and data model(ERD) -

ios - Can NSManagedObject conform to NSCoding -