fbpx
Wikipedia

Find first set

In computer software and hardware, find first set (ffs) or find first one is a bit operation that, given an unsigned machine word,[nb 1] designates the index or position of the least significant bit set to one in the word counting from the least significant bit position. A nearly equivalent operation is count trailing zeros (ctz) or number of trailing zeros (ntz), which counts the number of zero bits following the least significant one bit. The complementary operation that finds the index or position of the most significant set bit is log base 2, so called because it computes the binary logarithm ⌊log2(x)⌋.[1] This is closely related to count leading zeros (clz) or number of leading zeros (nlz), which counts the number of zero bits preceding the most significant one bit.[nb 2] There are two common variants of find first set, the POSIX definition which starts indexing of bits at 1,[2] herein labelled ffs, and the variant which starts indexing of bits at zero, which is equivalent to ctz and so will be called by that name.

Most modern CPU instruction set architectures provide one or more of these as hardware operators; software emulation is usually provided for any that aren't available, either as compiler intrinsics or in system libraries.

Examples

Given the following 32-bit word:

0000 0000 0000 0000 1000 0000 0000 1000

The count trailing zeros operation would return 3, while the count leading zeros operation returns 16. The count leading zeros operation depends on the word size: if this 32-bit word were truncated to a 16-bit word, count leading zeros would return zero. The find first set operation would return 4, indicating the 4th position from the right. The log base 2 is 15.

Similarly, given the following 32-bit word, the bitwise negation of the above word:

1111 1111 1111 1111 0111 1111 1111 0111

The count trailing ones operation would return 3, the count leading ones operation would return 16, and the find first zero operation ffz would return 4.

If the word is zero (no bits set), count leading zeros and count trailing zeros both return the number of bits in the word, while ffs returns zero. Both log base 2 and zero-based implementations of find first set generally return an undefined result for the zero word.

Hardware support

Many architectures include instructions to rapidly perform find first set and/or related operations, listed below. The most common operation is count leading zeros (clz), likely because all other operations can be implemented efficiently in terms of it (see Properties and relations).

Platform Mnemonic Name Operand widths Description On application to 0
ARM (ARMv5T architecture and later)
except Cortex-M0/M0+/M1/M23
clz[3] Count Leading Zeros 32 clz 32
ARM (ARMv8-A architecture) clz Count Leading Zeros 32, 64 clz Operand width
AVR32 clz[4] Count Leading Zeros 32 clz 32
DEC Alpha ctlz[5] Count Leading Zeros 64 clz 64
cttz[5] Count Trailing Zeros 64 ctz 64
Intel 80386 and later bsf[6] Bit Scan Forward 16, 32, 64 ctz Undefined; sets zero flag
bsr[6] Bit Scan Reverse 16, 32, 64 Log base 2 Undefined; sets zero flag
x86 supporting BMI1 or ABM lzcnt[7] Count Leading Zeros 16, 32, 64 clz Operand width; sets carry flag
x86 supporting BMI1 tzcnt[8] Count Trailing Zeros 16, 32, 64 ctz Operand width; sets carry flag
Itanium clz[9] Count Leading Zeros 64 clz 64
MIPS32/MIPS64 clz[10][11] Count Leading Zeros in Word 32, 64 clz Operand width
clo[10][11] Count Leading Ones in Word 32, 64 clo Operand width
Motorola 68020 and later bfffo[12] Find First One in Bit Field Arbitrary Log base 2 Field offset + field width
PDP-10 jffo Jump if Find First One 36 ctz 0; no operation
POWER/PowerPC/Power ISA cntlz/cntlzw/cntlzd[13] Count Leading Zeros 32, 64 clz Operand width
Power ISA 3.0 and later cnttzw/cnttzd[14] Count Trailing Zeros 32, 64 ctz Operand width
RISC-V ("B" Extension) (draft) clz[15] Count Leading Zeros 32, 64 clz Operand width
ctz[15] Count Trailing Zeros 32, 64 ctz Operand width
SPARC Oracle Architecture 2011 and later lzcnt (synonym: lzd)[16] Leading Zero Count 64 clz 64
VAX ffs[17] Find First Set 0–32 ctz Operand width; sets zero flag
IBM z/Architecture flogr[18] Find Leftmost One 64 clz 64
vclz[18] Vector Count Leading Zeroes 8, 16, 32, 64 clz Operand width
vctz[18] Vector Count Trailing Zeroes 8, 16, 32, 64 ctz Operand width

On some Alpha platforms CTLZ and CTTZ are emulated in software.

Tool and library support

A number of compiler and library vendors supply compiler intrinsics or library functions to perform find first set and/or related operations, which are frequently implemented in terms of the hardware instructions above:

Tool/library Name Type Input type(s) Notes On application to 0
POSIX.1 compliant libc
4.3BSD libc
OS X 10.3 libc[2][19]
ffs Library function int Includes glibc. POSIX does not supply the complementary log base 2 / clz. 0
FreeBSD 5.3 libc
OS X 10.4 libc[19]
ffsl
fls
flsl
Library function int,
long
fls("find last set") computes (log base 2) + 1. 0
FreeBSD 7.1 libc[20] ffsll
flsll
Library function long long 0
GCC 3.4.0[21][22]
Clang 5.x[23][24]
__builtin_ffs[l,ll,imax]
__builtin_clz[l,ll,imax]
__builtin_ctz[l,ll,imax]
Built-in functions unsigned int,
unsigned long,
unsigned long long,
uintmax_t
GCC documentation considers result undefined clz and ctz on 0. 0 (ffs)
Visual Studio 2005 _BitScanForward[25]
_BitScanReverse[26]
Compiler intrinsics unsigned long,
unsigned __int64
Separate return value to indicate zero input Undefined
Visual Studio 2008 __lzcnt[27] Compiler intrinsic unsigned short,
unsigned int,
unsigned __int64
Relies on hardware support for the lzcnt instruction introduced in BMI1 or ABM. Operand width
Visual Studio 2012 _arm_clz[28] Compiler intrinsic unsigned int Relies on hardware support for the clz instruction introduced in the ARMv5T architecture and later. ?
Intel C++ Compiler _bit_scan_forward
_bit_scan_reverse[29][30]
Compiler intrinsics int Undefined
NVIDIA CUDA[31] __clz Functions 32-bit, 64-bit Compiles to fewer instructions on the GeForce 400 Series 32
__ffs 0
LLVM llvm.ctlz.*
llvm.cttz.*[32]
Intrinsic 8, 16, 32, 64, 256 LLVM assembly language Operand width, if 2nd argument is 0; undefined otherwise
GHC 7.10 (base 4.8), in Data.Bits[citation needed] countLeadingZeros
countTrailingZeros
Library function FiniteBits b => b Haskell programming language Operand width
C++20 standard library, in header <bit>[33][34] bit_ceil bit_floor
bit_width
countl_zero countl_one
countr_zero countr_one
Library function unsigned char,
unsigned short,
unsigned int,
unsigned long,
unsigned long long

Properties and relations

If bits are labeled starting at 1 (which is the convention used in this article), then count trailing zeros and find first set operations are related by ctz(x) = ffs(x) − 1 (except when the input is zero). If bits are labeled starting at 0, then count trailing zeros and find first set are exactly equivalent operations. Given w bits per word, the log2 is easily computed from the clz and vice versa by log2(x) = w − 1 − clz(x).

As demonstrated in the example above, the find first zero, count leading ones, and count trailing ones operations can be implemented by negating the input and using find first set, count leading zeros, and count trailing zeros. The reverse is also true.

On platforms with an efficient log2 operation such as M68000, ctz can be computed by:

ctz(x) = log2(x & −x)

where & denotes bitwise AND and −x denotes the two's complement of x. The expression x & −x clears all but the least-significant 1 bit, so that the most- and least-significant 1 bit are the same.

On platforms with an efficient count leading zeros operation such as ARM and PowerPC, ffs can be computed by:

ffs(x) = w − clz(x & −x).

Conversely, on machines without log2 or clz operators, clz can be computed using ctz, albeit inefficiently:

clz = w − ctz(2⌈log2(x)⌉) (which depends on ctz returning w for the zero input)

On platforms with an efficient Hamming weight (population count) operation such as SPARC's POPC[35][36] or Blackfin's ONES,[37] there is:

ctz(x) = popcount((x & −x) − 1),[38][39] or ctz(x) = popcount(~(x | −x)),
ffs(x) = popcount(x ^ ~−x)[35]
clz = 32 − popcount(2⌈log2(x)⌉ − 1)

where ^ denotes bitwise exclusive-OR, | denotes bitwise OR and ~ denotes bitwise negation.

The inverse problem (given i, produce an x such that ctz(x) = i) can be computed with a left-shift (1 << i).

Find first set and related operations can be extended to arbitrarily large bit arrays in a straightforward manner by starting at one end and proceeding until a word that is not all-zero (for ffs, ctz, clz) or not all-one (for ffz, clo, cto) is encountered. A tree data structure that recursively uses bitmaps to track which words are nonzero can accelerate this.

Software emulation

Most CPUs dating from the late 1980s onward have bit operators for ffs or equivalent, but a few modern ones like some of the ARM-Mx series do not. In lieu of hardware operators for ffs, clz and ctz, software can emulate them with shifts, integer arithmetic and bitwise operators. There are several approaches depending on architecture of the CPU and to a lesser extent, the programming language semantics and compiler code generation quality. The approaches may be loosely described as linear search, binary search, search+table lookup, de Bruijn multiplication, floating point conversion/exponent extract, and bit operator (branchless) methods. There are tradeoffs between execution time and storage space as well as portability and efficiency.

Software emulations are usually deterministic. They return a defined result for all input values; in particular, the result for an input of all zero bits is usually 0 for ffs, and the bit length of the operand for the other operations.

If one has a hardware clz or equivalent, ctz can be efficiently computed with bit operations, but the converse is not true: clz is not efficient to compute in the absence of a hardware operator.

2n

The function 2⌈log2(x)⌉ (round up to the nearest power of two) using shifts and bitwise ORs[40] is not efficient to compute as in this 32-bit example and even more inefficient if we have a 64-bit or 128-bit operand:

function pow2(x): if x = 0 return invalid // invalid is implementation defined (not in [0,63]) x ← x - 1 for each y in {1, 2, 4, 8, 16}: x ← x | (x >> y) return x + 1 

FFS

Since ffs = ctz + 1 (POSIX) or ffs = ctz (other implementations), the applicable algorithms for ctz may be used, with a possible final step of adding 1 to the result, and returning 0 instead of the operand length for input of all zero bits.

CTZ

The canonical algorithm is a loop counting zeros starting at the LSB until a 1-bit is encountered:

function ctz1 (x) if x = 0 return w t ← 1 r ← 0 while (x & t) = 0 t ← t << 1 r ← r + 1 return r 

This algorithm executes O(n) time and operations, and is impractical in practice due to a large number of conditional branches.

A lookup table can eliminate most branches:

table[0..2n-1] = ctz(i) for i in 0..2n-1 function ctz2 (x) if x = 0 return w r ← 0 loop if (x & (2n-1)) ≠ 0 return r + table[x & (2n-1)] x ← x >> n r ← r + n 

The parameter n is fixed (typically 8) and represents a time–space tradeoff. The loop may also be fully unrolled. But as a linear lookup, this approach is still O(n) in the number of bits in the operand.

A binary search implementation takes a logarithmic number of operations and branches, as in this 32-bit version:[41][42] This algorithm can be assisted by a table as well, replacing the bottom three "if" statements with a 256 entry lookup table using the first non-zero byte encountered as an index.

function ctz3 (x) if x = 0 return 32 n ← 0 if (x & 0x0000FFFF) = 0: n ← n + 16, x ← x >> 16 if (x & 0x000000FF) = 0: n ← n + 8, x ← x >> 8 if (x & 0x0000000F) = 0: n ← n + 4, x ← x >> 4 if (x & 0x00000003) = 0: n ← n + 2, x ← x >> 2 if (x & 0x00000001) = 0: n ← n + 1 return n 

If the hardware has a clz operator, the most efficient approach to computing ctz is thus:

function ctz4 (x) x &= -x return w - (clz(x) + 1) 

An algorithm for 32-bit ctz uses de Bruijn sequences to construct a minimal perfect hash function that eliminates all branches.[43][44] This algorithm assumes that the result of the multiplication is truncated to 32 bit.

for i from 0 to 31: table[ ( 0x077CB531 * ( 1 << i ) ) >> 27 ] ← i // table [0..31] initialized function ctz5 (x) return table[((x & -x) * 0x077CB531) >> 27] 

The expression (x & -x) again isolates the least-significant 1 bit. There are then only 32 possible words, which the unsigned multiplication and shift hash to the correct position in the table. (This algorithm does not handle the zero input.)

CLZ

The canonical algorithm examines one bit at a time starting from the MSB until a non-zero bit is found, as shown in this example. It executes in O(n) time where n is the bit-length of the operand, and is not a practical algorithm for general use.

function clz1 (x) if x = 0 return w t ← 1 << (w - 1) r ← 0 while (x & t) = 0 t ← t >> 1 r ← r + 1 return r 

An improvement on the previous looping approach examines eight bits at a time then uses a 256 (28) entry lookup table for the first non-zero byte. This approach, however, is still O(n) in execution time.

function clz2 (x) if x = 0 return w t ← 0xff << (w - 8) r ← 0 while (x & t) = 0 t ← t >> 8 r ← r + 8 return r + table[x >> (w - 8 - r)] 

Binary search can reduce execution time to O(log2n):

function clz3 (x) if x = 0 return 32 n ← 0 if (x & 0xFFFF0000) = 0: n ← n + 16, x ← x << 16 if (x & 0xFF000000) = 0: n ← n + 8, x ← x << 8 if (x & 0xF0000000) = 0: n ← n + 4, x ← x << 4 if (x & 0xC0000000) = 0: n ← n + 2, x ← x << 2 if (x & 0x80000000) = 0: n ← n + 1 return n 

The fastest portable approaches to simulate clz are a combination of binary search and table lookup: an 8-bit table lookup (28=256 1-byte entries) can replace the bottom 3 branches in binary search. 64-bit operands require an additional branch. A larger width lookup can be used but the maximum practical table size is limited by the size of L1 data cache on modern processors, which is 32 KB for many. Saving a branch is more than offset by the latency of an L1 cache miss.

An algorithm similar to de Bruijn multiplication for CTZ works for CLZ, but rather than isolating the most-significant bit, it rounds up to the nearest integer of the form 2n−1 using shifts and bitwise ORs:[45]

table[0..31] = {0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30, 8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31} function clz4 (x) for each y in {1, 2, 4, 8, 16}: x ← x | (x >> y) return table[((x * 0x07C4ACDD) >> 27) % 32] 

For processors with deep pipelines, like Prescott and later Intel processors, it may be faster to replace branches by bitwise AND and OR operators (even though many more instructions are required) to avoid pipeline flushes for mispredicted branches (and these types of branches are inherently unpredictable):

function clz5 (x) r = (x > 0xFFFF) << 4; x >>= r; q = (x > 0xFF ) << 3; x >>= q; r |= q; q = (x > 0xF ) << 2; x >>= q; r |= q; q = (x > 0x3 ) << 1; x >>= q; r |= q; r |= (x >> 1); return r; 

On platforms that provide hardware conversion of integers to floating point, the exponent field can be extracted and subtracted from a constant to compute the count of leading zeros. Corrections are needed to account for rounding errors.[41][46] Floating point conversion can have substantial latency. This method is highly non-portable and not usually recommended.

int x;  int r; union { unsigned int u[2]; double d; } t; t.u[LE] = 0x43300000; // LE is 1 for little-endian t.u[!LE] = x; t.d -= 4503599627370496.0; r = (t.u[LE] >> 20) - 0x3FF; // log2 r++; // CLZ 

Applications

The count leading zeros (clz) operation can be used to efficiently implement normalization, which encodes an integer as m × 2e, where m has its most significant bit in a known position (such as the highest position). This can in turn be used to implement Newton–Raphson division, perform integer to floating point conversion in software, and other applications.[41][47]

Count leading zeros (clz) can be used to compute the 32-bit predicate "x = y" (zero if true, one if false) via the identity clz(x − y) >> 5, where ">>" is unsigned right shift.[48] It can be used to perform more sophisticated bit operations like finding the first string of n 1 bits.[49] The expression clz(x − y)1 << (16 − clz(x − 1)/2) is an effective initial guess for computing the square root of a 32-bit integer using Newton's method.[50] CLZ can efficiently implement null suppression, a fast data compression technique that encodes an integer as the number of leading zero bytes together with the nonzero bytes.[51] It can also efficiently generate exponentially distributed integers by taking the clz of uniformly random integers.[41]

The log base 2 can be used to anticipate whether a multiplication will overflow, since ⌈log2(xy)⌉ ≤ ⌈log2(x)⌉ + ⌈log2(y)⌉.[52]

Count leading zeros and count trailing zeros can be used together to implement Gosper's loop-detection algorithm,[53] which can find the period of a function of finite range using limited resources.[42]

The binary GCD algorithm spends many cycles removing trailing zeros; this can be replaced by a count trailing zeros (ctz) followed by a shift. A similar loop appears in computations of the hailstone sequence.

A bit array can be used to implement a priority queue. In this context, find first set (ffs) is useful in implementing the "pop" or "pull highest priority element" operation efficiently. The Linux kernel real-time scheduler internally uses sched_find_first_bit() for this purpose.[54]

The count trailing zeros operation gives a simple optimal solution to the Tower of Hanoi problem: the disks are numbered from zero, and at move k, disk number ctz(k) is moved the minimum possible distance to the right (circling back around to the left as needed). It can also generate a Gray code by taking an arbitrary word and flipping bit ctz(k) at step k.[42]

See also

Notes

  1. ^ Using bit operations on other than an unsigned machine word may yield undefined results.
  2. ^ These four operations also have (much less common) negated versions:
    • find first zero (ffz), which identifies the index of the least significant zero bit;
    • count trailing ones, which counts the number of one bits following the least significant zero bit.
    • count leading ones, which counts the number of one bits preceding the most significant zero bit;
    • find the index of the most significant zero bit, which is an inverted version of the binary logarithm.

References

  1. ^ Anderson. Find the log base 2 of an integer with the MSB N set in O(N) operations (the obvious way).
  2. ^ a b "FFS(3)". Linux Programmer's Manual. The Linux Kernel Archives. Retrieved 2012-01-02.
  3. ^ "ARM Instruction Reference > ARM general data processing instructions > CLZ". ARM Developer Suite Assembler Guide. ARM. Retrieved 2012-01-03.
  4. ^ (PDF) (CORP072610 ed.). Atmel Corporation. 2011. 32000D–04/201. Archived from the original (PDF) on 2017-10-25. Retrieved 2016-10-22.
  5. ^ a b Alpha Architecture Reference Manual (PDF). Compaq. 2002. pp. 4-32, 4-34.
  6. ^ a b Intel 64 and IA-32 Architectures Software Developer Manual. Vol. 2A. Intel. pp. 3-92–3-97. Order number 325383.
  7. ^ AMD64 Architecture Programmer's Manual Volume 3: General Purpose and System Instructions (PDF). Vol. 3. Advanced Micro Devices (AMD). 2011. pp. 204–205. Publication No. 24594.
  8. ^ "AMD64 Architecture Programmer's Manual, Volume 3: General-Purpose and System Instructions" (PDF). AMD64 Technology (Version 3.28 ed.). Advanced Micro Devices (AMD). September 2019 [2013]. Publication No. 24594. (PDF) from the original on 2019-09-30. Retrieved 2014-01-02.
  9. ^ Intel Itanium Architecture Software Developer's Manual. Volume 3: Intel Itanium Instruction Set. Vol. 3. Intel. 2010. pp. 3:38. from the original on 2019-06-26.
  10. ^ a b MIPS Architecture For Programmers. Volume II-A: The MIPS32 Instruction Set (Revision 3.02 ed.). MIPS Technologies. 2011. pp. 101–102.
  11. ^ a b MIPS Architecture For Programmers. Volume II-A: The MIPS64 Instruction Set (Revision 3.02 ed.). MIPS Technologies. 2011. pp. 105, 107, 122, 123.
  12. ^ (PDF) (revision 1 ed.). Motorola. 1992. pp. 4-43–4-45. M68000PRM/AD. Archived from the original (PDF) on 2019-12-08.
  13. ^ Frey, Brad. "Chapter 3.3.11 Fixed-Point Logical Instructions". PowerPC Architecture Book (Version 2.02 ed.). IBM. p. 70.
  14. ^ "Chapter 3.3.13 Fixed-Point Logical Instructions - Chapter 3.3.13.1 64-bit Fixed-Point Logical Instructions". Power ISA Version 3.0B. IBM. pp. 95, 98.
  15. ^ a b Wolf, Clifford (2019-03-22). "RISC-V "B" Bit Manipulation Extension for RISC-V" (PDF). Github (Draft) (v0.37 ed.). Retrieved 2020-01-09.
  16. ^ Oracle SPARC Architecture 2011. Oracle. 2011.
  17. ^ VAX Architecture Reference Manual (PDF). Digital Equipment Corporation (DEC). 1987. pp. 70–71. (PDF) from the original on 2019-09-29. Retrieved 2020-01-09.
  18. ^ a b c "Chapter 22. Vector Integer Instructions". IBM z/Architecture Principles of Operation (PDF) (Eleventh ed.). IBM. March 2015. pp. 7-219–22-10. SA22-7832-10. Retrieved 2020-01-10.
  19. ^ a b "FFS(3)". Mac OS X Developer Library. Apple, Inc. 1994-04-19. Retrieved 2012-01-04.
  20. ^ "FFS(3)". FreeBSD Library Functions Manual. The FreeBSD Project. Retrieved 2012-01-04.
  21. ^ "Other built-in functions provided by GCC". Using the GNU Compiler Collection (GCC). Free Software Foundation, Inc. Retrieved 2015-11-14.
  22. ^ "GCC 3.4.0 ChangeLog". GCC 3.4.0. Free Software Foundation, Inc. Retrieved 2015-11-14.
  23. ^ "Clang Language Extensions - chapter Builtin Functions". The Clang Team. Retrieved 2017-04-09. Clang supports a number of builtin library functions with the same syntax as GCC
  24. ^ "Source code of Clang". LLVM Team, University of Illinois at Urbana-Champaign. Retrieved 2017-04-09.
  25. ^ "_BitScanForward, _BitScanForward64". Visual Studio 2008: Visual C++: Compiler Intrinsics. Microsoft. Retrieved 2018-05-21.
  26. ^ "_BitScanReverse, _BitScanReverse64". Visual Studio 2008: Visual C++: Compiler Intrinsics. Microsoft. Retrieved 2018-05-21.
  27. ^ "__lzcnt16, __lzcnt, __lzcnt64". Visual Studio 2008: Visual C++: Compiler Intrinsics. Microsoft. Retrieved 2012-01-03.
  28. ^ "ARM intrinsics". Visual Studio 2012: Visual C++: Compiler Intrinsics. Microsoft. Retrieved 2022-05-09.
  29. ^ "Intel Intrinsics Guide". Intel. Retrieved 2020-04-03.
  30. ^ Intel C++ Compiler for Linux Intrinsics Reference. Intel. 2006. p. 21.
  31. ^ NVIDIA CUDA Programming Guide (PDF) (Version 3.0 ed.). NVIDIA. 2010. p. 92.
  32. ^ "'llvm.ctlz.*' Intrinsic, 'llvm.cttz.*' Intrinsic". LLVM Language Reference Manual. The LLVM Compiler Infrastructure. Retrieved 2016-02-23.
  33. ^ Smith, Richard (2020-04-01). N4861 Working Draft, Standard for Programming Language C++ (PDF). ISO/IEC. pp. 1150–1153. Retrieved 2020-05-25.
  34. ^ "Standard library header <bit>". cppreference.com. Retrieved 2020-05-25.
  35. ^ a b SPARC International, Inc. (1992). "A.41: Population Count. Programming Note" (PDF). The SPARC architecture manual: version 8 (Version 8 ed.). Englewood Cliffs, New Jersey, USA: Prentice Hall. pp. 231. ISBN 978-0-13-825001-0.
  36. ^ Warren, Jr., Henry S. (2013) [2002]. Hacker's Delight (2 ed.). Addison Wesley - Pearson Education, Inc. ISBN 978-0-321-84268-8. 0-321-84268-5.
  37. ^ Blackfin Instruction Set Reference (Preliminary ed.). Analog Devices. 2001. pp. 8–24. Part Number 82-000410-14.
  38. ^ Dietz, Henry Gordon. "The Aggregate Magic Algorithms". University of Kentucky. from the original on 2019-10-31.
  39. ^ Isenberg, Gerd (2019-11-03) [2012]. "BitScanProtected". Chess Programming Wiki (CPW). from the original on 2020-01-09. Retrieved 2020-01-09.
  40. ^ Anderson. Round up to the next highest power of 2.
  41. ^ a b c d Warren. Chapter 5-3: Counting Leading 0's.
  42. ^ a b c Warren. Chapter 5-4: Counting Trailing 0's.
  43. ^ Leiserson, Charles E.; Prokop, Harald; Randall, Keith H. (1998-07-07). "Using de Bruijn Sequences to Index a 1 in a Computer Word" (PDF). MIT Laboratory for Computer Science, Cambridge, MA, USA. (PDF) from the original on 2020-01-09. Retrieved 2020-01-09.
  44. ^ Busch, Philip (2009-03-01) [2009-02-21]. "Computing Trailing Zeros HOWTO" (PDF). (PDF) from the original on 2016-08-01. Retrieved 2020-01-09.
  45. ^ Anderson. Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup.
  46. ^ Anderson. Find the integer log base 2 of an integer with a 64-bit IEEE float.
  47. ^ Sloss, Andrew N.; Symes, Dominic; Wright, Chris (2004). ARM system developer's guide designing and optimizing system software (1 ed.). San Francisco, CA, USA: Morgan Kaufmann. pp. 212–213. ISBN 978-1-55860-874-0.
  48. ^ Warren. Chapter 2-11: Comparison Predicates.
  49. ^ Warren. Chapter 6-2: Find First String of 1-Bits of a Given Length.
  50. ^ Warren. Chapter 11-1: Integer Square Root.
  51. ^ Schlegel, Benjamin; Gemulla, Rainer; Lehner, Wolfgang (June 2010). Fast integer compression using SIMD instructions. Proceedings of the Sixth International Workshop on Data Management on New Hardware (DaMoN 2010). pp. 34–40. CiteSeerX 10.1.1.230.6379. doi:10.1145/1869389.1869394. ISBN 978-1-45030189-3.
  52. ^ Warren. Chapter 2-12: Overflow Detection.
  53. ^ Gosper, Bill (April 1995) [1972-02-29]. Baker, Henry Givens Jr. (ed.). "Loop detector". HAKMEM (retyped & converted ed.). Cambridge, Massachusetts, USA: Artificial Intelligence Laboratory, Massachusetts Institute of Technology (MIT). AI Memo 239 Item 132. from the original on 2019-10-08. Retrieved 2020-01-09.
  54. ^ Aas, Josh (2005-02-17). Understanding the Linux 2.6.8.1 CPU Scheduler (PDF). Silicon Graphics, Inc. (SGI). p. 19. (PDF) from the original on 2017-05-19. Retrieved 2020-01-09.

Further reading

External links

  • Intel Intrinsics Guide
  • Chess Programming Wiki: BitScan: A detailed explanation of a number of implementation methods for ffs (called LS1B) and log base 2 (called MS1B).

find, first, computer, software, hardware, find, first, find, first, operation, that, given, unsigned, machine, word, designates, index, position, least, significant, word, counting, from, least, significant, position, nearly, equivalent, operation, count, tra. In computer software and hardware find first set ffs or find first one is a bit operation that given an unsigned machine word nb 1 designates the index or position of the least significant bit set to one in the word counting from the least significant bit position A nearly equivalent operation is count trailing zeros ctz or number of trailing zeros ntz which counts the number of zero bits following the least significant one bit The complementary operation that finds the index or position of the most significant set bit is log base 2 so called because it computes the binary logarithm log2 x 1 This is closely related to count leading zeros clz or number of leading zeros nlz which counts the number of zero bits preceding the most significant one bit nb 2 There are two common variants of find first set the POSIX definition which starts indexing of bits at 1 2 herein labelled ffs and the variant which starts indexing of bits at zero which is equivalent to ctz and so will be called by that name Most modern CPU instruction set architectures provide one or more of these as hardware operators software emulation is usually provided for any that aren t available either as compiler intrinsics or in system libraries Contents 1 Examples 2 Hardware support 3 Tool and library support 4 Properties and relations 5 Software emulation 5 1 2n 5 2 FFS 5 3 CTZ 5 4 CLZ 6 Applications 7 See also 8 Notes 9 References 10 Further reading 11 External linksExamples EditGiven the following 32 bit word 0000 0000 0000 0000 1000 0000 0000 1000The count trailing zeros operation would return 3 while the count leading zeros operation returns 16 The count leading zeros operation depends on the word size if this 32 bit word were truncated to a 16 bit word count leading zeros would return zero The find first set operation would return 4 indicating the 4th position from the right The log base 2 is 15 Similarly given the following 32 bit word the bitwise negation of the above word 1111 1111 1111 1111 0111 1111 1111 0111The count trailing ones operation would return 3 the count leading ones operation would return 16 and the find first zero operation ffz would return 4 If the word is zero no bits set count leading zeros and count trailing zeros both return the number of bits in the word while ffs returns zero Both log base 2 and zero based implementations of find first set generally return an undefined result for the zero word Hardware support EditMany architectures include instructions to rapidly perform find first set and or related operations listed below The most common operation is count leading zeros clz likely because all other operations can be implemented efficiently in terms of it see Properties and relations Platform Mnemonic Name Operand widths Description On application to 0ARM ARMv5T architecture and later except Cortex M0 M0 M1 M23 clz 3 Count Leading Zeros 32 clz 32ARM ARMv8 A architecture clz Count Leading Zeros 32 64 clz Operand widthAVR32 clz 4 Count Leading Zeros 32 clz 32DEC Alpha ctlz 5 Count Leading Zeros 64 clz 64cttz 5 Count Trailing Zeros 64 ctz 64Intel 80386 and later bsf 6 Bit Scan Forward 16 32 64 ctz Undefined sets zero flagbsr 6 Bit Scan Reverse 16 32 64 Log base 2 Undefined sets zero flagx86 supporting BMI1 or ABM lzcnt 7 Count Leading Zeros 16 32 64 clz Operand width sets carry flagx86 supporting BMI1 tzcnt 8 Count Trailing Zeros 16 32 64 ctz Operand width sets carry flagItanium clz 9 Count Leading Zeros 64 clz 64MIPS32 MIPS64 clz 10 11 Count Leading Zeros in Word 32 64 clz Operand widthclo 10 11 Count Leading Ones in Word 32 64 clo Operand widthMotorola 68020 and later bfffo 12 Find First One in Bit Field Arbitrary Log base 2 Field offset field widthPDP 10 jffo Jump if Find First One 36 ctz 0 no operationPOWER PowerPC Power ISA cntlz cntlzw cntlzd 13 Count Leading Zeros 32 64 clz Operand widthPower ISA 3 0 and later cnttzw cnttzd 14 Count Trailing Zeros 32 64 ctz Operand widthRISC V B Extension draft clz 15 Count Leading Zeros 32 64 clz Operand widthctz 15 Count Trailing Zeros 32 64 ctz Operand widthSPARC Oracle Architecture 2011 and later lzcnt synonym lzd 16 Leading Zero Count 64 clz 64VAX ffs 17 Find First Set 0 32 ctz Operand width sets zero flagIBM z Architecture flogr 18 Find Leftmost One 64 clz 64vclz 18 Vector Count Leading Zeroes 8 16 32 64 clz Operand widthvctz 18 Vector Count Trailing Zeroes 8 16 32 64 ctz Operand widthOn some Alpha platforms CTLZ and CTTZ are emulated in software Tool and library support EditA number of compiler and library vendors supply compiler intrinsics or library functions to perform find first set and or related operations which are frequently implemented in terms of the hardware instructions above Tool library Name Type Input type s Notes On application to 0POSIX 1 compliant libc4 3BSD libcOS X 10 3 libc 2 19 ffs Library function int Includes glibc POSIX does not supply the complementary log base 2 clz 0FreeBSD 5 3 libcOS X 10 4 libc 19 ffslflsflsl Library function int long fls find last set computes log base 2 1 0FreeBSD 7 1 libc 20 ffsllflsll Library function long long 0GCC 3 4 0 21 22 Clang 5 x 23 24 builtin ffs l ll imax builtin clz l ll imax builtin ctz l ll imax Built in functions unsigned int unsigned long unsigned long long uintmax t GCC documentation considers result undefined clz and ctz on 0 0 ffs Visual Studio 2005 BitScanForward 25 BitScanReverse 26 Compiler intrinsics unsigned long unsigned int64 Separate return value to indicate zero input UndefinedVisual Studio 2008 lzcnt 27 Compiler intrinsic unsigned short unsigned int unsigned int64 Relies on hardware support for the lzcnt instruction introduced in BMI1 or ABM Operand widthVisual Studio 2012 arm clz 28 Compiler intrinsic unsigned int Relies on hardware support for the clz instruction introduced in the ARMv5T architecture and later Intel C Compiler bit scan forward bit scan reverse 29 30 Compiler intrinsics int UndefinedNVIDIA CUDA 31 clz Functions 32 bit 64 bit Compiles to fewer instructions on the GeForce 400 Series 32 ffs 0LLVM llvm ctlz llvm cttz 32 Intrinsic 8 16 32 64 256 LLVM assembly language Operand width if 2nd argument is 0 undefined otherwiseGHC 7 10 base 4 8 in Data Bits citation needed countLeadingZeroscountTrailingZeros Library function FiniteBits b gt b Haskell programming language Operand widthC 20 standard library in header lt bit gt 33 34 bit ceil bit floorbit widthcountl zero countl onecountr zero countr one Library function unsigned char unsigned short unsigned int unsigned long unsigned long longProperties and relations EditIf bits are labeled starting at 1 which is the convention used in this article then count trailing zeros and find first set operations are related by ctz x ffs x 1 except when the input is zero If bits are labeled starting at 0 then count trailing zeros and find first set are exactly equivalent operations Given w bits per word the log2 is easily computed from the clz and vice versa by log2 x w 1 clz x As demonstrated in the example above the find first zero count leading ones and count trailing ones operations can be implemented by negating the input and using find first set count leading zeros and count trailing zeros The reverse is also true On platforms with an efficient log2 operation such as M68000 ctz can be computed by ctz x log2 x amp x where amp denotes bitwise AND and x denotes the two s complement of x The expression x amp x clears all but the least significant 1 bit so that the most and least significant 1 bit are the same On platforms with an efficient count leading zeros operation such as ARM and PowerPC ffs can be computed by ffs x w clz x amp x Conversely on machines without log2 or clz operators clz can be computed using ctz albeit inefficiently clz w ctz 2 log2 x which depends on ctz returning w for the zero input On platforms with an efficient Hamming weight population count operation such as SPARC s POPC 35 36 or Blackfin s ONES 37 there is ctz x popcount x amp x 1 38 39 or ctz x popcount x x ffs x popcount x x 35 clz 32 popcount 2 log2 x 1 where denotes bitwise exclusive OR denotes bitwise OR and denotes bitwise negation The inverse problem given i produce an x such that ctz x i can be computed with a left shift 1 lt lt i Find first set and related operations can be extended to arbitrarily large bit arrays in a straightforward manner by starting at one end and proceeding until a word that is not all zero for ffs ctz clz or not all one for ffz clo cto is encountered A tree data structure that recursively uses bitmaps to track which words are nonzero can accelerate this Software emulation EditMost CPUs dating from the late 1980s onward have bit operators for ffs or equivalent but a few modern ones like some of the ARM Mx series do not In lieu of hardware operators for ffs clz and ctz software can emulate them with shifts integer arithmetic and bitwise operators There are several approaches depending on architecture of the CPU and to a lesser extent the programming language semantics and compiler code generation quality The approaches may be loosely described as linear search binary search search table lookup de Bruijn multiplication floating point conversion exponent extract and bit operator branchless methods There are tradeoffs between execution time and storage space as well as portability and efficiency Software emulations are usually deterministic They return a defined result for all input values in particular the result for an input of all zero bits is usually 0 for ffs and the bit length of the operand for the other operations If one has a hardware clz or equivalent ctz can be efficiently computed with bit operations but the converse is not true clz is not efficient to compute in the absence of a hardware operator 2n Edit The function 2 log2 x round up to the nearest power of two using shifts and bitwise ORs 40 is not efficient to compute as in this 32 bit example and even more inefficient if we have a 64 bit or 128 bit operand function pow2 x if x 0 return invalid invalid is implementation defined not in 0 63 x x 1 for each y in 1 2 4 8 16 x x x gt gt y return x 1 FFS Edit Since ffs ctz 1 POSIX or ffs ctz other implementations the applicable algorithms for ctz may be used with a possible final step of adding 1 to the result and returning 0 instead of the operand length for input of all zero bits CTZ Edit The canonical algorithm is a loop counting zeros starting at the LSB until a 1 bit is encountered function ctz1 x if x 0 return w t 1 r 0 while x amp t 0 t t lt lt 1 r r 1 return r This algorithm executes O n time and operations and is impractical in practice due to a large number of conditional branches A lookup table can eliminate most branches table 0 2n 1 ctz i for i in 0 2n 1 function ctz2 x if x 0 return w r 0 loop if x amp 2n 1 0 return r table x amp 2n 1 x x gt gt n r r n The parameter n is fixed typically 8 and represents a time space tradeoff The loop may also be fully unrolled But as a linear lookup this approach is still O n in the number of bits in the operand A binary search implementation takes a logarithmic number of operations and branches as in this 32 bit version 41 42 This algorithm can be assisted by a table as well replacing the bottom three if statements with a 256 entry lookup table using the first non zero byte encountered as an index function ctz3 x if x 0 return 32 n 0 if x amp 0x0000FFFF 0 n n 16 x x gt gt 16 if x amp 0x000000FF 0 n n 8 x x gt gt 8 if x amp 0x0000000F 0 n n 4 x x gt gt 4 if x amp 0x00000003 0 n n 2 x x gt gt 2 if x amp 0x00000001 0 n n 1 return n If the hardware has a clz operator the most efficient approach to computing ctz is thus function ctz4 x x amp x return w clz x 1 An algorithm for 32 bit ctz uses de Bruijn sequences to construct a minimal perfect hash function that eliminates all branches 43 44 This algorithm assumes that the result of the multiplication is truncated to 32 bit for i from 0 to 31 table 0x077CB531 1 lt lt i gt gt 27 i table 0 31 initialized function ctz5 x return table x amp x 0x077CB531 gt gt 27 The expression x amp x again isolates the least significant 1 bit There are then only 32 possible words which the unsigned multiplication and shift hash to the correct position in the table This algorithm does not handle the zero input CLZ Edit The canonical algorithm examines one bit at a time starting from the MSB until a non zero bit is found as shown in this example It executes in O n time where n is the bit length of the operand and is not a practical algorithm for general use function clz1 x if x 0 return w t 1 lt lt w 1 r 0 while x amp t 0 t t gt gt 1 r r 1 return r An improvement on the previous looping approach examines eight bits at a time then uses a 256 28 entry lookup table for the first non zero byte This approach however is still O n in execution time function clz2 x if x 0 return w t 0xff lt lt w 8 r 0 while x amp t 0 t t gt gt 8 r r 8 return r table x gt gt w 8 r Binary search can reduce execution time to O log2n function clz3 x if x 0 return 32 n 0 if x amp 0xFFFF0000 0 n n 16 x x lt lt 16 if x amp 0xFF000000 0 n n 8 x x lt lt 8 if x amp 0xF0000000 0 n n 4 x x lt lt 4 if x amp 0xC0000000 0 n n 2 x x lt lt 2 if x amp 0x80000000 0 n n 1 return n The fastest portable approaches to simulate clz are a combination of binary search and table lookup an 8 bit table lookup 28 256 1 byte entries can replace the bottom 3 branches in binary search 64 bit operands require an additional branch A larger width lookup can be used but the maximum practical table size is limited by the size of L1 data cache on modern processors which is 32 KB for many Saving a branch is more than offset by the latency of an L1 cache miss An algorithm similar to de Bruijn multiplication for CTZ works for CLZ but rather than isolating the most significant bit it rounds up to the nearest integer of the form 2n 1 using shifts and bitwise ORs 45 table 0 31 0 9 1 10 13 21 2 29 11 14 16 18 22 25 3 30 8 12 20 28 15 17 24 7 19 27 23 6 26 5 4 31 function clz4 x for each y in 1 2 4 8 16 x x x gt gt y return table x 0x07C4ACDD gt gt 27 32 For processors with deep pipelines like Prescott and later Intel processors it may be faster to replace branches by bitwise AND and OR operators even though many more instructions are required to avoid pipeline flushes for mispredicted branches and these types of branches are inherently unpredictable function clz5 x r x gt 0xFFFF lt lt 4 x gt gt r q x gt 0xFF lt lt 3 x gt gt q r q q x gt 0xF lt lt 2 x gt gt q r q q x gt 0x3 lt lt 1 x gt gt q r q r x gt gt 1 return r On platforms that provide hardware conversion of integers to floating point the exponent field can be extracted and subtracted from a constant to compute the count of leading zeros Corrections are needed to account for rounding errors 41 46 Floating point conversion can have substantial latency This method is highly non portable and not usually recommended int x int r union unsigned int u 2 double d t t u LE 0x43300000 LE is 1 for little endian t u LE x t d 4503599627370496 0 r t u LE gt gt 20 0x3FF log2 r CLZApplications EditThe count leading zeros clz operation can be used to efficiently implement normalization which encodes an integer as m 2e where m has its most significant bit in a known position such as the highest position This can in turn be used to implement Newton Raphson division perform integer to floating point conversion in software and other applications 41 47 Count leading zeros clz can be used to compute the 32 bit predicate x y zero if true one if false via the identity clz x y gt gt 5 where gt gt is unsigned right shift 48 It can be used to perform more sophisticated bit operations like finding the first string of n 1 bits 49 The expression clz x y 1 lt lt 16 clz x 1 2 is an effective initial guess for computing the square root of a 32 bit integer using Newton s method 50 CLZ can efficiently implement null suppression a fast data compression technique that encodes an integer as the number of leading zero bytes together with the nonzero bytes 51 It can also efficiently generate exponentially distributed integers by taking the clz of uniformly random integers 41 The log base 2 can be used to anticipate whether a multiplication will overflow since log2 xy log2 x log2 y 52 Count leading zeros and count trailing zeros can be used together to implement Gosper s loop detection algorithm 53 which can find the period of a function of finite range using limited resources 42 The binary GCD algorithm spends many cycles removing trailing zeros this can be replaced by a count trailing zeros ctz followed by a shift A similar loop appears in computations of the hailstone sequence A bit array can be used to implement a priority queue In this context find first set ffs is useful in implementing the pop or pull highest priority element operation efficiently The Linux kernel real time scheduler internally uses sched find first bit for this purpose 54 The count trailing zeros operation gives a simple optimal solution to the Tower of Hanoi problem the disks are numbered from zero and at move k disk number ctz k is moved the minimum possible distance to the right circling back around to the left as needed It can also generate a Gray code by taking an arbitrary word and flipping bit ctz k at step k 42 See also EditBit Manipulation Instruction Sets BMI for Intel and AMD x86 based processors Trailing zero Leading zero Trailing digit Leading digitNotes Edit Using bit operations on other than an unsigned machine word may yield undefined results These four operations also have much less common negated versions find first zero ffz which identifies the index of the least significant zero bit count trailing ones which counts the number of one bits following the least significant zero bit count leading ones which counts the number of one bits preceding the most significant zero bit find the index of the most significant zero bit which is an inverted version of the binary logarithm References Edit Anderson Find the log base 2 of an integer with the MSB N set in O N operations the obvious way a b FFS 3 Linux Programmer s Manual The Linux Kernel Archives Retrieved 2012 01 02 ARM Instruction Reference gt ARM general data processing instructions gt CLZ ARM Developer Suite Assembler Guide ARM Retrieved 2012 01 03 AVR32 Architecture Document PDF CORP072610 ed Atmel Corporation 2011 32000D 04 201 Archived from the original PDF on 2017 10 25 Retrieved 2016 10 22 a b Alpha Architecture Reference Manual PDF Compaq 2002 pp 4 32 4 34 a b Intel 64 and IA 32 Architectures Software Developer Manual Vol 2A Intel pp 3 92 3 97 Order number 325383 AMD64 Architecture Programmer s Manual Volume 3 General Purpose and System Instructions PDF Vol 3 Advanced Micro Devices AMD 2011 pp 204 205 Publication No 24594 AMD64 Architecture Programmer s Manual Volume 3 General Purpose and System Instructions PDF AMD64 Technology Version 3 28 ed Advanced Micro Devices AMD September 2019 2013 Publication No 24594 Archived PDF from the original on 2019 09 30 Retrieved 2014 01 02 Intel Itanium Architecture Software Developer s Manual Volume 3 Intel Itanium Instruction Set Vol 3 Intel 2010 pp 3 38 Archived from the original on 2019 06 26 a b MIPS Architecture For Programmers Volume II A The MIPS32 Instruction Set Revision 3 02 ed MIPS Technologies 2011 pp 101 102 a b MIPS Architecture For Programmers Volume II A The MIPS64 Instruction Set Revision 3 02 ed MIPS Technologies 2011 pp 105 107 122 123 M68000 Family Programmer s Reference Manual Includes CPU32 Instructions PDF revision 1 ed Motorola 1992 pp 4 43 4 45 M68000PRM AD Archived from the original PDF on 2019 12 08 Frey Brad Chapter 3 3 11 Fixed Point Logical Instructions PowerPC Architecture Book Version 2 02 ed IBM p 70 Chapter 3 3 13 Fixed Point Logical Instructions Chapter 3 3 13 1 64 bit Fixed Point Logical Instructions Power ISA Version 3 0B IBM pp 95 98 a b Wolf Clifford 2019 03 22 RISC V B Bit Manipulation Extension for RISC V PDF Github Draft v0 37 ed Retrieved 2020 01 09 Oracle SPARC Architecture 2011 Oracle 2011 VAX Architecture Reference Manual PDF Digital Equipment Corporation DEC 1987 pp 70 71 Archived PDF from the original on 2019 09 29 Retrieved 2020 01 09 a b c Chapter 22 Vector Integer Instructions IBM z Architecture Principles of Operation PDF Eleventh ed IBM March 2015 pp 7 219 22 10 SA22 7832 10 Retrieved 2020 01 10 a b FFS 3 Mac OS X Developer Library Apple Inc 1994 04 19 Retrieved 2012 01 04 FFS 3 FreeBSD Library Functions Manual The FreeBSD Project Retrieved 2012 01 04 Other built in functions provided by GCC Using the GNU Compiler Collection GCC Free Software Foundation Inc Retrieved 2015 11 14 GCC 3 4 0 ChangeLog GCC 3 4 0 Free Software Foundation Inc Retrieved 2015 11 14 Clang Language Extensions chapter Builtin Functions The Clang Team Retrieved 2017 04 09 Clang supports a number of builtin library functions with the same syntax as GCC Source code of Clang LLVM Team University of Illinois at Urbana Champaign Retrieved 2017 04 09 BitScanForward BitScanForward64 Visual Studio 2008 Visual C Compiler Intrinsics Microsoft Retrieved 2018 05 21 BitScanReverse BitScanReverse64 Visual Studio 2008 Visual C Compiler Intrinsics Microsoft Retrieved 2018 05 21 lzcnt16 lzcnt lzcnt64 Visual Studio 2008 Visual C Compiler Intrinsics Microsoft Retrieved 2012 01 03 ARM intrinsics Visual Studio 2012 Visual C Compiler Intrinsics Microsoft Retrieved 2022 05 09 Intel Intrinsics Guide Intel Retrieved 2020 04 03 Intel C Compiler for Linux Intrinsics Reference Intel 2006 p 21 NVIDIA CUDA Programming Guide PDF Version 3 0 ed NVIDIA 2010 p 92 llvm ctlz Intrinsic llvm cttz Intrinsic LLVM Language Reference Manual The LLVM Compiler Infrastructure Retrieved 2016 02 23 Smith Richard 2020 04 01 N4861 Working Draft Standard for Programming Language C PDF ISO IEC pp 1150 1153 Retrieved 2020 05 25 Standard library header lt bit gt cppreference com Retrieved 2020 05 25 a b SPARC International Inc 1992 A 41 Population Count Programming Note PDF The SPARC architecture manual version 8 Version 8 ed Englewood Cliffs New Jersey USA Prentice Hall pp 231 ISBN 978 0 13 825001 0 Warren Jr Henry S 2013 2002 Hacker s Delight 2 ed Addison Wesley Pearson Education Inc ISBN 978 0 321 84268 8 0 321 84268 5 Blackfin Instruction Set Reference Preliminary ed Analog Devices 2001 pp 8 24 Part Number 82 000410 14 Dietz Henry Gordon The Aggregate Magic Algorithms University of Kentucky Archived from the original on 2019 10 31 Isenberg Gerd 2019 11 03 2012 BitScanProtected Chess Programming Wiki CPW Archived from the original on 2020 01 09 Retrieved 2020 01 09 Anderson Round up to the next highest power of 2 a b c d Warren Chapter 5 3 Counting Leading 0 s a b c Warren Chapter 5 4 Counting Trailing 0 s Leiserson Charles E Prokop Harald Randall Keith H 1998 07 07 Using de Bruijn Sequences to Index a 1 in a Computer Word PDF MIT Laboratory for Computer Science Cambridge MA USA Archived PDF from the original on 2020 01 09 Retrieved 2020 01 09 Busch Philip 2009 03 01 2009 02 21 Computing Trailing Zeros HOWTO PDF Archived PDF from the original on 2016 08 01 Retrieved 2020 01 09 Anderson Find the log base 2 of an N bit integer in O lg N operations with multiply and lookup Anderson Find the integer log base 2 of an integer with a 64 bit IEEE float Sloss Andrew N Symes Dominic Wright Chris 2004 ARM system developer s guide designing and optimizing system software 1 ed San Francisco CA USA Morgan Kaufmann pp 212 213 ISBN 978 1 55860 874 0 Warren Chapter 2 11 Comparison Predicates Warren Chapter 6 2 Find First String of 1 Bits of a Given Length Warren Chapter 11 1 Integer Square Root Schlegel Benjamin Gemulla Rainer Lehner Wolfgang June 2010 Fast integer compression using SIMD instructions Proceedings of the Sixth International Workshop on Data Management on New Hardware DaMoN 2010 pp 34 40 CiteSeerX 10 1 1 230 6379 doi 10 1145 1869389 1869394 ISBN 978 1 45030189 3 Warren Chapter 2 12 Overflow Detection Gosper Bill April 1995 1972 02 29 Baker Henry Givens Jr ed Loop detector HAKMEM retyped amp converted ed Cambridge Massachusetts USA Artificial Intelligence Laboratory Massachusetts Institute of Technology MIT AI Memo 239 Item 132 Archived from the original on 2019 10 08 Retrieved 2020 01 09 Aas Josh 2005 02 17 Understanding the Linux 2 6 8 1 CPU Scheduler PDF Silicon Graphics Inc SGI p 19 Archived PDF from the original on 2017 05 19 Retrieved 2020 01 09 Further reading EditWarren Jr Henry S 2013 2002 Hacker s Delight 2 ed Addison Wesley Pearson Education Inc ISBN 978 0 321 84268 8 0 321 84268 5 Anderson Sean Eron 2005 1997 Bit Twiddling Hacks Stanford University Archived from the original on 2020 01 08 Retrieved 2012 01 03 NB Lists several efficient public domain C implementations for count trailing zeros and log base 2 External links EditIntel Intrinsics Guide Chess Programming Wiki BitScan A detailed explanation of a number of implementation methods for ffs called LS1B and log base 2 called MS1B Retrieved from https en wikipedia org w index php title Find first set amp oldid 1114340390, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.