A DOI (digital object identifier) is a persistent identifier used to uniquely identify various objects (usually documents or data sets). DOIs are typically presented as a link consisting of a proxy, a prefix and a suffix: for example:
https://doi.org/10.1017/9781108778039
The proxy (here, https://doi.org/
) is the location of a server that will resolve the DOI to the correct online location of the resource. The prefix (10.1017
) is assigned by a DOI registration agency such as CrossRef or DataCite to an organization to form a namespace that ensures that DOIs are globally unique. The suffix (here, 9781108778039
) is chosen by the registrant and can, in principle, be almost anything (here it seems to be an ISBN) but there is an increasing consensus, outlined in a DataCite blog article, that suffixes should be chosen to be:
The recommendation in this article was to use the base-32 encoding of a random integer suggested by Douglas Crockford, and DataCite released a tool, cirneco, for generating DOIs in this format, which looks like:
https://doi.org/10.61092/drkw-vb9g
However, the cirneco tool is written in Ruby. The code below implements the Cool DOI principles to generate random DOIs in Python. Since it doesn't include a checksum character, there is a pool of $32^8$ = 1.1 trillion DOIs to draw from. It has no external dependencies outside of the core Python library.
An online service implementing this code is also available on this site.
import random
class DOIGenerator:
"""A generator for DOIs conforming to the "Cool DOIs" convention.
Cool DOIs (https://datacite.org/blog/cool-dois/) have the format:
https://doi.org/<PREFIX>/xxxx-xxxx
where <PREFIX> is the namespace assigned to an organization by a
DOI registration agency (e.g. CrossRef or DataCite) and the suffix
xxxx-xxxx consists of two blocks of characters chosen from an
alphabet of symbols consisting of the digits 0-9 and letters A-Z
excluding I, L and U. DOIs are case-insensitive.
"""
symbols = "0123456789ABCDEFGHJKMNPQRSTVWXYZ"
nsymbols = len(symbols)
encode_dict = {i: c for i, c in enumerate(symbols)}
def __init__(self, prefix):
"""Initialize a DOIGenerator.
The DOI prefix must be provided, but no proxy (e.g. https://doi.org/).
Any trailing slash character will be stripped.
"""
if prefix[-1] == "/":
prefix = prefix[:-1]
self.prefix = prefix
def make_doi(self, include_proxy=False):
"""Return a random DOI, perhaps including the proxy."""
n = random.randrange(32**8)
suffix = ""
while n > 0:
r = n % DOIGenerator.nsymbols
n //= DOIGenerator.nsymbols
suffix = DOIGenerator.encode_dict[r] + suffix
suffix = f"{suffix:>08}"
suffix = suffix[:4] + "-" + suffix[4:]
doi = f"{self.prefix}/{suffix}".lower()
if include_proxy:
return "https://doi.org/" + doi
return doi
def make_dois(self, ndois, include_proxy=False):
"""A generator yielding ndois DOIS, perhaps with proxies."""
for _ in range(ndois):
yield self.make_doi(include_proxy)
if __name__ == "__main__":
# Test the DOI generator.
doi_generator = DOIGenerator("10.61092")
for doi in doi_generator.make_dois(15, True):
print(doi)
Comments
Comments are pre-moderated. Please be patient and your comment will appear soon.
There are currently no comments
New Comment