What N Things?

(0 comments)

The mildly controversial geocoding system What3words encodes the geographic coordinates of a location (to a resolution of 3 m) on the Earth's surface into three dictionary words, through some proprietary algorithm. The idea is that human's find it easier to remember and communicate these words than the sequence of digits that makes up the corresponding latitude and longitude. For example, the Victoria Memorial in front of Buckingham Palace in London is located at (51.50187, -0.14063) in decimal latitude, longitude coordinates, but simply using.woods.laws in the language of What3words.

The company that established the system in 2013 has been criticised for being closed-source with opaque licensing terms and an extremely active PR team whose press releases have often been picked up unquestioningly by the media.

Anyway, let's build our own geocoding system in Python.

Latitude is conventionally measured in degrees South and North of the equator (-90° to +90°) and longitude in degrees East and West of the Prime Meridian (-180° to +180°). Let's suppose we aim for a resolution of 0.001° (0.06′); for this we require $N = 180,000 \times 360,000 = 64.8$ billion points. This corresponds to a resolution of roughly 100 m (it varies with latitude because of the Earth's slightly ellipsoid shape, and with longitude because lines of longitude converge from a maximum spacing at the equator to meet at the poles).

If we encode these points as a sequence of $m$ objects (e.g. words) drawn from a list of length $M$, then we must have $M > N^{1/m}$. [For example, What3words apparently achieves its 3 m resolution with $N=5.7 \times 10^{13}$ tiles and requires a list of $N^{1/3} \approx 40,000$ words.]

Our resolution is lower, so using $m=3$ we can get away with a little over 4,000 distinct words. Let's take them from this Wiktionary list of the most frequent 10,000 words in the English language. First, parse this html file to extract just the words from its tables. We don't care for words with fewer than three letters, or for those with numbers, apostrophes, etc. in them, so exclude these from our wordlist. The following code produces the text file wordlist.txt:

from bs4 import BeautifulSoup

wiktionary_file = 'Wiktionary_Frequency lists_PG_2006_04_1-10000 - Wiktionary.html'

html = open(wiktionary_file).read()
soup = BeautifulSoup(html, 'html.parser')

tables = soup.find_all('table')

with open('wordlist.txt', 'w') as fo:
    for table in tables:
        for tr in table.find_all('tr')[1:]:
            tds = tr.find_all('td')
            word = tds[1].find_all('a')[0].text
            if len(word) > 2 and word.isalpha():
                print(word.lower(), file=fo)

Now we can use this word list to produce our encoded geolocation:

import sys
import math

# Settings: the resolution is 10^(-ires) and we use m words.
ires = 1000
ndp = int(math.log10(ires))
m = 3

# Read the word list
words = [line.strip() for line in open('wordlist.txt')]

# Total number of locations
N = 180 * 360 * ires**2
# "Length" of each of the m dimensions
d = round(N**(1/m))
if d > len(words):
    print(f'Warning: I need {d} words, but got {len(words)}.')
    sys.exit(1)


def encode(lat, lng):
    """Encode a latitude and longitude as m words."""

    # Integer, scaled longitude and latitude as positive quantities.
    ilam, iphi = int((lat+90)*ires), int((lng+180)*ires)
    # The one-dimensional index corresponding to ilam, iphi.
    idx = 180 * ires * iphi + ilam

    # Construct the word sequence.
    j = [0]*m
    for i in range(m):
        idx, j[i] = divmod(idx, d)

    return '.'.join(words[j[i]] for i in range(m))

def decode(code):
    """Decode three words in the list code as a latitude and longitude."""

    j = [words.index(word) for word in code]
    # Get the one-dimensional index corresponding to the code.
    idx = 0
    for i in range(m):
        idx += d**i * j[i]
    # Convert to longitude and latitude, accounting for the offset.
    iphi, ilam = divmod(idx, 180 * ires)
    lam, phi = ilam / ires - 90, iphi / ires - 180
    return round(lam, ndp), round(phi, ndp)

try:
    lat, lng = float(sys.argv[1]), float(sys.argv[2])
    print(encode(lat, lng))
except (IndexError, ValueError):
    if len(sys.argv) == m+1:
        code = sys.argv[1:]
    else:
        code = sys.argv[1].split('.')
    print(decode(code))

For example:

$ python get_loc.py 42.360 -71.092         # MIT
event.modest.file

$ python get_loc.py 25.197139 55.274111    # Burj Khalifa
invention.chiefly.shed

$ python get_loc.py -33.957314 18.403108   # Table Mountain
jewels.east.jane

$ python get_loc.py usual.meal.exhibited         # New Zealand Parliament Buildings
(-41.278, 174.777)
Current rating: 5

Comments

Comments are pre-moderated. Please be patient and your comment will appear soon.

There are currently no comments

New Comment

required

required (not published)

optional

required