entries/
extract_html.py
html_entries/
msword/
pdf/