c++ - My SSE2 Flooring function has some problems -
so wrote function using sse2 floors vector seems work purposes example works fine bi-linear filtering algorithm when used perform modulo comes values off. function works performing conversion integer vector using truncation , converts floating point. both floor , modulo code listed below:
inline __m128 floor_simd(const __m128 & a) { __m128i int_val = _mm_cvttps_epi32(a); return _mm_cvtepi32_ps(int_val); } inline __m128 mod_simd(const __m128 & x, const __m128 & y) { return _mm_sub_ps(x, _mm_mul_ps(y, floor_simd(_mm_div_ps(x, y)))); }
might have explanation why i'm getting odd values modulo?
edit: example when 1 uses mod_simd(_mm_set1_ps(63.6f), _mm_set1_ps(32.0f)) produce faulty answer mod_simd(_mm_set1_ps(23.6f), _mm_set1_ps(32.0f)) produce correct answer. when replace floor function less efficient component wise version works fine.
i solved own problem. benefit of here resulting code. subtracts 1 value if greater original value compensates truncation problem
inline __m128 floor_simd(const __m128 & a) { static const __m128 1 = _mm_set1_ps(1.0f); __m128 fval = _mm_cvtepi32_ps(_mm_cvttps_epi32(a)); return _mm_sub_ps(fval, _mm_and_ps(_mm_cmplt_ps(a, fval), one)); }
Comments
Post a Comment