BOM and Python

By chimo on (updated on )

TL;DR: Decode data with "utf-8-sig" to handle files that may or may not be encoded with BOM.

At work, we have a few Python scripts that read a couple of .csv files that are modified and uploaded to a server by different end-users. These users aren't necessarily developers or codec nerds (an amicable term, I assure you) and may not know what BOM or UTF-8 is. Since the scripts don't write to those files, we've changed the Python code from:

.read().decode('utf-8')

to:

.read().decode('utf-8-sig')

Things seems to be working as intended so far.

Recent articles from blogs I follow

The Scunthorpe Problem

I was talking with a friend recently about an email of theirs running afoul (🐔) of another aggressive filter system, because they dared to to talk with someone called Dickson. I know right, they’re the absolute worst. For those unfamiliar, this is the The…

via Rubenerd November 21, 2024

In which Neil is surprised by the lack of an HDMI cable

Some modern technology decisions baffle me. Today, I was sitting in a meeting room. In the room was my friend, with her laptop. Her laptop has an HDMI port. Also in the room was a screen, onto which my friend wished to display her laptop’s desktop. The screen …

via Neil's blog November 19, 2024

Helm: JSON schema generation

Helm charts support the inclusion of a values.schema.json file to validate values.yaml. Documentation: https://helm.sh/docs/topics/charts/#schema-files A JSON schema is akin to defining the structure of and type-annotating a JSON file. It helps to “shift lef…

via not just serendipity November 14, 2024