In the code presented below, we read in the whole text and loop over every word in every paragraph checking to see if it is on the banned list and replacing it with the same length of asterisks if it is.
# Lower-case versions of all words will be compared with these banned words:
banned_words = ['c', 'perl', 'fortran']
fi = open('censor-test.txt', 'r')
text = fi.read()
fi.close()
# Loop over each word in each paragraph: split the text by the newline
# character to obtain a list of paragraphs; split each paragraph on
# whitespace to obtain a list of words.
censored_paras = []
for para in text.split('\n'):
censored_words = []
# For each paragraph, split into words on whitespace
for word in para.strip().split():
compare_word = word.lower()
# Strip punctuation characters from the word
for punctuation in (',.:;!?\'"'):
compare_word = compare_word.replace(punctuation, '')
if compare_word.lower() in banned_words:
word = word.lower().replace(compare_word, '*'*len(compare_word))
censored_words.append(word)
censored_paras.append(' '.join(censored_words))
censored_text = '\n'.join(censored_paras)
print(censored_text)
When fed the file censor_text.txt
it produces:
Some alternative programming languages to Python are *, C++, ****, *******
and Java.
Of these, Python is the best. Not Java or ****!
The fastest are *, C++ and *******.