When Last We Left Our Heroes …
A few things have changed in the three months(?!?) since the last article.
For reasons I
was forced decided to upgrade to Linux Mint 21.1l
which has Python 3.10.6 installed.
Also for reasons, I bought a Windows 10 machine that also has Python 3.11.2 available to download. So I guess I’m going to get acquainted with the Power Shell.1
I made a few upgrades to
duplicate-files.rb and its companion program,
Replaced the overbuilt Printer objects with a simple and boring
ifstatement near the end of
Replaced the terrible logic in
Hashes used like sets. Code at the end turns them into lists of files.
Built a Progress class to create a progress bar with ASCII while the program compares each file in its list of duplicates. As of this writing it over-reports progress, because I’m not accurately estimating how many operations it takes to compare files. Still working on it, amidst other things.
As a result of the previous failure, the program logs performance data, including the estimated and actual number of comparisons, the wall-clock time to run each part of the script, and the calculated map of files with the same size. Perf logs go either to the Temp directory or to a directory the user can specify on the command line. At some point I’ll run some simulations to figure out why the choose function and even factorial grossly underestimate the number of comparisons needed.
duplicate-files.rbis much simplified(?) and simpler to use. It takes output from
duplicate-files.rb, either in a file or piped in directly, and deletes all but one of the duplicate files listed. And that’s it. (Previous versions also deleted files listed on the command line. I lost more output files that way …)
I was going to “surprise” my hypothetical readers with that fact once
duplicate-files.py was nearly complete, but that day seems ever more
Instead, I’m going to track the changes as I make them, or at least as they become relevant.
All of which look interesting. But so do Antlr, Erlang, Hugo’s template language, Lua, Scheme, and all the projects I listed elsewhere. There just aren’t enough hours.
Meanwhile I put about two dozen posts on this site. Another dozen still remain as drafts, ranging from a title and topic to text almost done but still in need of editing. Plus my usual habit of tweaking things I’ve already published when I find a typo or braino.
“Holy Short Attention Span, Bat– Ooh, Shiny”
So, hypothetical readers, it seems I have too many irons in the fire and not enough heat. I will continue this series, though.2
Reproducing a script I’d been hacking on for more than two decades, though, sounds like a lot. Maybe a version 0.1 proof of concept: recurse down directories, collect sets of same-sized files, compare them, report the results in whatever convenient format (JSON, YAML, Python lists, whatever).
After that I may bail and pursue another Python project, or some smaller explorations.
Same Bat-Time, Same Bat-Channel
So, hopefully in something less than three months, I’ll finally have a “Relearning Python #1” ready.
I install Cygwin on every Windows I “own”, even ones at work. What can I say? I love a command line. So Power Shell, which accepts at least a few Linux and DOS commands, may finally – if belatedly – wean me off that habit. Or not. We’ll see. Anyway, I’m not sure native Python will run under Cygwin. ↩︎
But that’s what Harlan Ellison said about The Last Dangerous Visions. Too soon? (Too late?) ↩︎