The Digital Object Identifier (DOI) resolution service at doi.org exposes an API for retrieving the BibTeX markup for a reference given its DOI. The following Python 3 script takes a DOI on the command line and returns the BibTeX. For example,
$ python doi2bib.py 10.1177/1470593113512323
@article{Avis_2013,
doi = {10.1177/1470593113512323},
url = {https://doi.org/10.1177%2F1470593113512323},
year = 2013,
month = {dec},
publisher = {{SAGE} Publications},
volume = {14},
number = {4},
pages = {451--475},
author = {M. Avis and S. Forbes and S. Ferguson},
title = {The brand personality of rocks: A critical evaluation of a brand personality scale},
journal = {Marketing Theory}
}
The code is below. Update 23/1/2024: to add line-endings after each BibTeX field you can pass the string returned from doi.org through the bibtexparser package (this must be installed)
import sys
import urllib.request
import bibtexparser
from urllib.error import HTTPError
BASE_URL = 'http://dx.doi.org/'
try:
doi = sys.argv[1]
except IndexError:
print('Usage:\n{} <doi>'.format(sys.argv[0]))
sys.exit(1)
url = BASE_URL + doi
req = urllib.request.Request(url)
req.add_header('Accept', 'application/x-bibtex')
try:
with urllib.request.urlopen(req) as f:
bibtex = f.read().decode()
# The round-trip through bibtexparser adds line endings.
bibtex = bibtexparser.loads(bibtex)
bibtex = bibtexparser.dumps(bibtex)
print(bibtex)
except HTTPError as e:
if e.code == 404:
print('DOI not found.')
else:
print('Service unavailable.')
sys.exit(1)
Comments
Comments are pre-moderated. Please be patient and your comment will appear soon.
Dr. Prateek Raj Gautam 3 years, 3 months ago
Excellent work.
Link | ReplyI face one issue that year entry in bibfile is not enclosed in curly brackets {}
so here is my modification on received string
## define function
def doi2bib(DOI):
global DIR
global bibComment
BASE_URL = 'http://dx.doi.org/'
bibString=bibComment
spacer='\n\n'
for i in range(0,len(DOI)):
try:
doi = DOI[i]
except IndexError:
print('Usage:\n{} <doi>'.format(doi))
## sys.exit(1)
url = BASE_URL + doi
req = urllib.request.Request(url)
req.add_header('Accept', 'application/x-bibtex')
try:
with urllib.request.urlopen(req) as f:
bibtex = f.read().decode()
testStr='year = '
for i in range(0,len(bibtex)-len(testStr)):
if bibtex[i:i+len(testStr)]==testStr:
if bibtex[i+len(testStr)+1]!='{':
year='{'+str(bibtex[i+len(testStr):i+len(testStr)+4]) + '}'
print(year)
for j in range(i+len(testStr),i+len(testStr)+10):
if bibtex[j]==',':
split1=i+len(testStr)
split2=j
newbib=bibtex[0:split1] + year + bibtex[split2:len(bibtex)]
bibtex=newbib
## print(bibtex)
except HTTPError as e:
if e.code == 404:
print(doi + 'DOI not found.')
else:
print('Service unavailable.')
bibString=bibString+spacer+bibtex
with open(DIR + Output + '.bib','w') as F:
F.write(bibString)
return bibString
## call function with following values
DIR='./'
Output='BibtexFromDOI'
DOI=['10.1109/tii.2019.2908437','10.1049/iet-com.2019.1298']
bib=doi2bib(DOI)
christian 3 years, 3 months ago
Interesting. Is it mandatory to have braces around a purely numeric value for the year, though?
Link | ReplyGüray Hatipoğlu 1 year, 6 months ago
It partially retrieves arXiv paper's metadata, and finishes with the following error:
Link | Reply""
c:\tex>Focus to learn more
'Focus' is not recognized as an internal or external command,
operable program or batch file.
""
For example, try this: 10.48550/arXiv.2304.00728
Karl Svozil 1 year, 1 month ago
Would it be possible to go back to line breaks inbetween fields? Thank you!
Link | Replychristian 1 year, 1 month ago
Hi Karl,
Link | ReplyI'm not sure I know what you mean – the BibTeX that gets returned is whatever is returned by the DOI foundation. My own code here has not changed.
Or is it the input DOIs you want separated by line breaks?
Cheers, Christian
AL 10 months, 3 weeks ago
I think that the previous comment was related to the fact that, lrecently, we get the following (using the example at the top of this page):
Link | Reply@article{Avis_2013, title={The brand personality of rocks: A critical evaluation of a brand personality scale}, volume={14}, ISSN={1741-301X}, url={http://dx.doi.org/10.1177/1470593113512323}, DOI={10.1177/1470593113512323}, number={4}, journal={Marketing Theory}, publisher={SAGE Publications}, author={Avis, Mark and Forbes, Sarah and Ferguson, Shelagh}, year={2013}, month=dec, pages={451–475} }
i.e. all the fields are not listed one line after the other (as in the example on top of this page), but rather in a single line. Is that due to external factors?
Thanks for your help
christian 10 months, 3 weeks ago
Oh, I see. I guess something must have changed in the string that doi.org returns. You can get line-endings by sending the output through bibtexparser if you have that installed – I've updated the code above: let me know how you get on.
Link | ReplyCheers, Christian
New Comment