This is a reason why people should feel safer taking a plane or train, which is my point.
dan
Even if flying gets a bit less safe, there would have to be far, far more plane crashes (at least three orders of magnitude more) for it to become anywhere near as dangerous as driving.
Of course they're pre 2025... It's only February so there's no full year stats for 2025 yet.
Flying is still the safest form of transport.
There's 1.17 deaths and 42 injuries per 100 million miles travelled by car in the USA. In comparison, there's only 0.007 injuries per 100 million miles flown in commercial planes in the USA. Even trains are more dangerous at 0.1 injuries per 100 million miles.
You're far, far more likely to be in a car crash on your way to the airport compared to being involved in a plane crash.
Too bad the California high-speed rail project is being threatened by President Musk.
I can never seem to get used to that 10,000ft standard.
The standard is 8,000 feet, not 10,000. Some planes, like the Boeing 787, are pressurized to 6,000ft instead.
TIL EXCEPTION JOIN
. I thought the SQL dialect I usually use at work for data warehouse queries (Presto) didn't have anything like this, but it does, and calls it EXCEPT
: https://prestodb.io/docs/current/sql/select.html#union-intersect-except-clause
Good to know. Beats my usual approach of using WHERE x NOT IN (SELECT ...)
. I'm mostly a front end developer so these things are outside my comfort zone sometimes.
When I worked for my state this is how we had some data. A master table that you then had to join like five or six exception tables to remove the “questionable” entries from the master.
At my workplace, we'd have a data pipeline (think something like Apache Airflow) that pulls the master table once the daily partition lands, joins the exception tables, then produces a "clean" output table which is the one that people would actually query. At a previous employer, we would have used a materialized/indexed view (we used SQL Server for both OLTP and OLAP). Is that not common in government?
I haven’t looked into paperless-ai yet, but I hope my machine would be beefy enough for this task
You need a GPU with a decent amount of VRAM to get LLMs working well locally. I don't have a new enough GPU to be useful - my server just has the Intel iGPU, and my desktop PC only has a GTX1080, which is from before Nvidia added Tensor cores for AI.
And that sticker also has the ASN in human readable form?
Yes! They look like this:
So you would then add many documents at once to the feeder, and Paperless will read the QR and also split documents whenever a new code appears? What about documents you don’t want to keep physically? Is there a way to get Paperless to split them automatically as well if you add many to the feeder?
Paperless supports two different splitting methods:
- If it encounters an ASN QR code, it'll split at that point and keep the page with the barcode
- If it encounters a special barcode that's used as a separator sheet, it'll split at that point and delete the page with the barcode. By default it looks for a "Patch T" barcode, and you can a page with a Patch T barcode from https://www.alliancegroup.co.uk/patch-codes.htm
so all you need to do is have a "Patch T" page between each document and it'll split them automatically.
Docs: https://docs.paperless-ngx.com/advanced_usage/#document-splitting
I'm also using paperless-ai
to automatically tag and set a title for scanned documents. Very useful. I'd love to run my own AI locally using ollama, but I don't have good enough hardware so for now I'm using Google's Gemini 2.0 Flash. I trust Google's privacy policy far more than OpenAI's, Google Gemini is very cheap, and if you use the paid version they don't retain any of your data nor use it for training.
a VM with torrent client and a killswitched VPN
You can use Docker for the same setup using the --network container:vpn
flag to docker run
or network_mode: "container:vpn"
option in docker-compose.yml where vpn
is the name of the container to route through. This makes one Docker container use the network of another (the VPN one), so both containers will share the same internal IP address, and you'll have to map any ports on the VPN container rather than the torrent/whatever one. This is just as safe as a killswitched VPN.
Unraid has a nice UI for it when editing a Docker container:
also meant if it ever got virused I could just roll it back
Consider using a file system that has snapshots, like ZFS. Then you can get this same behaviour for your whole system rather than just a VM :)
is it ok to sit on the perpetual license (for a few years at a time), or are the updates really required?
I'm not sure, as the new licensing model is pretty new. I purchased Unraid in 2023, and back then, all licenses included lifetime updates. They switched to a subscription mode to make the business more viable long-term and afford to hire more developers, which I definitely understand.
It supports GPU passthrough right
It does. You can pass through any PCIe devices, so for example if you have multiple network cards, you can pass one directly to a VM (it's a bit more efficient compared to using a virtual Ethernet adapter)
ScanSnap iX1600. I bought mine from B&H: https://www.bhphotovideo.com/c/product/1615326-REG/fujitsu_pa03770_b635_scansnap_ix1600_document_scanner.html. There's two scanners that usually get recommended for paperless: this one, and a cheaper (but not as nice) Brother one.
It's a really compact unit - smaller than I thought it'd be! You can put up to 50 sheets in the feeder and it scans them all, on both sides (no need to manually flip the pages). Can scan 40 pages per minute.
I've combined it with ASN (archive serial number) QR code stickers for documents that I need to keep a physical copy of. I'm using Avery 5267 stickers + Avery's online designer site to design and print them. If I need to keep a physical copy of the document, I stick a sticker on the document, scan it, and Paperless automatically detects the QR code and sets the ASN. Then I keep all the physical copies in a binder, ordered by ASN. If I need to locate a physical document, I find it in Paperless, check the ASN, then go to the right document in the binder (easy to find the right place since they're all in order).
There's just a few minor issues with the scanner, but otherwise it's perfect:
- It was a bit expensive, at $400 in the USA.
- You need a Windows or MacOS system to do the initial setup. Setting it up is done through a desktop app rather than through the touchscreen on the device.
- Some of the options need a computer connected to the scanner via USB, or signing up to their cloud service. However, it does support scanning to a SMB share without a computer connected, which is all I needed. I have my paperless-ngx "consume" directory shared via Samba. You just need to delete the default scanning profiles and add a network scan (SMB) one.
This depends a lot on if your employer is good or not. I get 20 days bereavement leave per year for close family (spouse, kids, parents) and 10 days for extended family (grandparents)