DOI to BibTeX

(7 comments)

The Digital Object Identifier (DOI) resolution service at doi.org exposes an API for retrieving the BibTeX markup for a reference given its DOI. The following Python 3 script takes a DOI on the command line and returns the BibTeX. For example,

$ python doi2bib.py 10.1177/1470593113512323

@article{Avis_2013,
    doi = {10.1177/1470593113512323},
    url = {https://doi.org/10.1177%2F1470593113512323},
    year = 2013,
    month = {dec},
    publisher = {{SAGE} Publications},
    volume = {14},
    number = {4},
    pages = {451--475},
    author = {M. Avis and S. Forbes and S. Ferguson},
    title = {The brand personality of rocks: A critical evaluation of a brand personality scale},
    journal = {Marketing Theory}
}

The code is below. Update 23/1/2024: to add line-endings after each BibTeX field you can pass the string returned from doi.org through the bibtexparser package (this must be installed)

import sys
import urllib.request
import bibtexparser
from urllib.error import HTTPError

BASE_URL = 'http://dx.doi.org/'

try:
    doi = sys.argv[1]
except IndexError:
    print('Usage:\n{} <doi>'.format(sys.argv[0]))
    sys.exit(1)

url = BASE_URL + doi
req = urllib.request.Request(url)
req.add_header('Accept', 'application/x-bibtex')
try:
    with urllib.request.urlopen(req) as f:
        bibtex = f.read().decode()
        # The round-trip through bibtexparser adds line endings.
        bibtex = bibtexparser.loads(bibtex)
        bibtex = bibtexparser.dumps(bibtex)
    print(bibtex)
except HTTPError as e:
    if e.code == 404:
        print('DOI not found.')
    else:
        print('Service unavailable.')
    sys.exit(1)
Current rating: 4.8

Comments

Comments are pre-moderated. Please be patient and your comment will appear soon.

Dr. Prateek Raj Gautam 2 years, 6 months ago

Excellent work.
I face one issue that year entry in bibfile is not enclosed in curly brackets {}

so here is my modification on received string


## define function
def doi2bib(DOI):
global DIR
global bibComment
BASE_URL = 'http://dx.doi.org/'
bibString=bibComment
spacer='\n\n'
for i in range(0,len(DOI)):
try:
doi = DOI[i]
except IndexError:
print('Usage:\n{} <doi>'.format(doi))
## sys.exit(1)

url = BASE_URL + doi
req = urllib.request.Request(url)
req.add_header('Accept', 'application/x-bibtex')
try:
with urllib.request.urlopen(req) as f:
bibtex = f.read().decode()
testStr='year = '
for i in range(0,len(bibtex)-len(testStr)):
if bibtex[i:i+len(testStr)]==testStr:
if bibtex[i+len(testStr)+1]!='{':
year='{'+str(bibtex[i+len(testStr):i+len(testStr)+4]) + '}'
print(year)
for j in range(i+len(testStr),i+len(testStr)+10):
if bibtex[j]==',':
split1=i+len(testStr)
split2=j
newbib=bibtex[0:split1] + year + bibtex[split2:len(bibtex)]
bibtex=newbib




## print(bibtex)
except HTTPError as e:
if e.code == 404:
print(doi + 'DOI not found.')
else:
print('Service unavailable.')

bibString=bibString+spacer+bibtex

with open(DIR + Output + '.bib','w') as F:
F.write(bibString)
return bibString



## call function with following values
DIR='./'
Output='BibtexFromDOI'
DOI=['10.1109/tii.2019.2908437','10.1049/iet-com.2019.1298']

bib=doi2bib(DOI)

Link | Reply
Current rating: 1

christian 2 years, 6 months ago

Interesting. Is it mandatory to have braces around a purely numeric value for the year, though?

Link | Reply
Current rating: 1

Güray Hatipoğlu 9 months, 3 weeks ago

It partially retrieves arXiv paper's metadata, and finishes with the following error:
""
c:\tex>Focus to learn more
'Focus' is not recognized as an internal or external command,
operable program or batch file.
""

For example, try this: 10.48550/arXiv.2304.00728

Link | Reply
Currently unrated

Karl Svozil 4 months ago

Would it be possible to go back to line breaks inbetween fields? Thank you!

Link | Reply
Currently unrated

christian 4 months ago

Hi Karl,
I'm not sure I know what you mean – the BibTeX that gets returned is whatever is returned by the DOI foundation. My own code here has not changed.
Or is it the input DOIs you want separated by line breaks?
Cheers, Christian

Link | Reply
Currently unrated

AL 1 month, 3 weeks ago

I think that the previous comment was related to the fact that, lrecently, we get the following (using the example at the top of this page):

@article{Avis_2013, title={The brand personality of rocks: A critical evaluation of a brand personality scale}, volume={14}, ISSN={1741-301X}, url={http://dx.doi.org/10.1177/1470593113512323}, DOI={10.1177/1470593113512323}, number={4}, journal={Marketing Theory}, publisher={SAGE Publications}, author={Avis, Mark and Forbes, Sarah and Ferguson, Shelagh}, year={2013}, month=dec, pages={451–475} }

i.e. all the fields are not listed one line after the other (as in the example on top of this page), but rather in a single line. Is that due to external factors?
Thanks for your help

Link | Reply
Currently unrated

christian 1 month, 3 weeks ago

Oh, I see. I guess something must have changed in the string that doi.org returns. You can get line-endings by sending the output through bibtexparser if you have that installed – I've updated the code above: let me know how you get on.
Cheers, Christian

Link | Reply
Currently unrated

New Comment

required

required (not published)

optional

required