BOM and Python

By chimo on (updated on )

TL;DR: Decode data with "utf-8-sig" to handle files that may or may not be encoded with BOM.

At work, we have a few Python scripts that read a couple of .csv files that are modified and uploaded to a server by different end-users. These users aren't necessarily developers or codec nerds (an amicable term, I assure you) and may not know what BOM or UTF-8 is. Since the scripts don't write to those files, we've changed the Python code from:

.read().decode('utf-8')

to:

.read().decode('utf-8-sig')

Things seems to be working as intended so far.

Recent articles from blogs I follow

I think fedizens should be able to disable replies to some or all of their posts

Every so often, there is a bit of a debate in the fediverse about whether a person should be able to make a post to which other users cannot reply. Yes, they should My view is simple: yes, they should. It is no different to running a website and not offering…

via Neil's blog December 7, 2024

Adding Encrypted Swap and a Userspace OOM-Killer

When setting up my Ideapad, I didn't configure swap because I wanted to avoid reducing the (already unknown) lifetime of it's eMMC storage. This, however, has proven to be a mistake - the Ideapad only has 4GB of RAM and I'm quite good at accid…

via www.bentasker.co.uk December 7, 2024

Advent of Code: Day 4

Link to Day #4 puzzle.

via not just serendipity December 6, 2024