Two's complement and Python indexing

Posted on 07 April 2024

Integers are represented in computers as binary (base 2) numbers. If the integer is unsigned (i.e.) non-negative, then there is a straightforward representation as the sequence of bits (1 or 0) which comprise the binary number. For example, the number 13 is $1101_2$ ($1(2^3) + 1(2^2) + 0(2^1) + 1(2^0)$), or stored in 8 bits, 00001101. In such a representation there is no space to indicate whether the number is negative or positive and if a fixed number of bits is used, then overflow or underflow can occur in arithmetic operations that lead to results larger than can be represented in that fixed number of bits or less that zero. For example, the maximum number that can be represented in 8 bits is $255 = 11111111_2$. Adding one to this leads to the incorrect result $0 = 00000000_2$ since there is no room for the 9th bit required to represent $256=100000000_2$.

This sort of behaviour can be demonstrated with NumPy's uint8 ("unsigned integer, 8-bits") type:

In [x]: import numpy as np
In [x]: arr = np.array([255], dtype=np.uint8)
In [x]: arr + 1
Out[x]: array([0], dtype=uint8)

What if we want to represent signed integers (i.e. negative as well as positive numbers)? Certainly we will have to sacrifice a bit (usually the most significant bit: the leftmost one with the greatest place value) to designate the sign. The simplest approach might seem to just agree that if this bit is 0 the number is positive and if it is 1 then the number is negative. In 8 bits, therefore, the number -13 would be represented as 10001101. This sign–magnitude representation was indeed used in some early computers, but the convention makes arithmetic and comparison of numbers slower and more complicated, and so modern computers use the formulation known as two's complement. Again, the most significant bit indicates the sign (0 for positive numbers, 1 for negative numbers). However, to determine the negative value of an integer in this representation, all of the bits are flipped and one added (ignoring any overflow). For example, to form -13 in an 8-bit system, start with 00001101, invert all the bits to give 11110010 and then add one to give 11110011. This representation has advantages for the hardware implementation of arithmetic and ensures that there is only one representation of zero (no +0 and -0, for the negation of 00000000 leads back to 00000000 after one is added to its flipped version, 11111111, with the overflow ignored). The name "two's complement" arises because the sum of a non-zero integer and its negative in this representation in $N$ bits is equal to $2^{N+1}$ (e.g. 13 + (-13) would be 00001101 + 11110011 = 100000000 = $2^9$).

The two's complement representation of signed integers has the consequence that -n = ~n + 1, where ~ is the bit-inversion unary operator. Therefore, ~n = -n - 1 which means that indexing into Python sequences with ~1 returns the last-but-one (penultimate) element of the sequence, since ~1 = -2:

In [x]: lst = list("ABCDE")
In [x]: lst[~1]
Out[x]: 'D'

This means that one could (but probably shouldn't) pretend that Python indexes start at one by indexing as ~-i instead of i:

In [x]: lst[~-1]
Out[x]: 'A'

In [x]: lst[~-1: ~-4]
Out[x]: ['A', 'B', 'C']