BOM and Python

By chimo on (updated on )

TL;DR: Decode data with "utf-8-sig" to handle files that may or may not be encoded with BOM.

At work, we have a few Python scripts that read a couple of .csv files that are modified and uploaded to a server by different end-users. These users aren't necessarily developers or codec nerds (an amicable term, I assure you) and may not know what BOM or UTF-8 is. Since the scripts don't write to those files, we've changed the Python code from:

.read().decode('utf-8')

to:

.read().decode('utf-8-sig')

Things seems to be working as intended so far.