Filter and Weasel Words

I learned new terms today: filter words and weasel words. In short:

  • Filter get in the way of prose and distance the reader from the action. They make it easy for the writer to tell the reader what’s going on instead of showing them. “The dog seemed agitated waiting for her owner,” instead of “The dog paced in front of the bay windows waiting for her owner.” “The pretty girl looked uninterested in the guy asking for her number,” instead of “The pretty girl ignored the guy asking for her number.”
  • Weasel words leave text feeling ambiguous. “He might be the hero’s brother.” “The body may have been stolen.” “The dog could have eaten the roast.” These sentences don’t help tell the story because the weasel words (in bold) render the text meaningless. “He might be unrelated to the hero.” “The body may be right where we left it.” “The dog could have ignored the roast and slept.”

Removing these cleans up the prose and makes it more interesting. A post on Scribophile suggested writing a macro in Word to detect these words and flag them but, since I don’t use Word, that doesn’t help me. Instead I implemented the same functionality in sh.

#!/bin/sh

CAT=/bin/cat
DIRNAME=/usr/bin/dirname
ECHO=/bin/echo
GREP=/bin/grep

wordLists="filter weasel"

if [ ${#} -gt 0 ]; then
    source=$(${DIRNAME} ${0})

    for file in ${@}; do
        for list in ${wordLists}; do
            IFS=$'\n'
            words=$(${CAT} "${source}/${list}-words.txt")
            for word in ${words}; do
                ${GREP} -q -i "\<${word}\>" ${file}
                if [ ${?} == 0 ]; then
                    ${ECHO} "${file} ${word}"
                fi
            done
        done
    done
else
    ${ECHO} "Usage: ${0} <filename1> [filename2 ...]"
fi

Bear in mind this is a quick and dirty attempt (it’s not as flexible as I’d like) but the basic functionality is there.

$ sh ../filter.sh A-00prologue.tex | head -n 5
A-00prologue.tex a bit
A-00prologue.tex could
A-00prologue.tex decided
A-00prologue.tex knew
A-00prologue.tex looked
$ sh ../filter.sh A-00prologue.tex | tail -n 5
A-00prologue.tex this
A-00prologue.tex that
A-00prologue.tex thought
A-00prologue.tex up
A-00prologue.tex wondered
$ sh ../filter.sh A-00prologue.tex | wc -l
48

I haven’t gone through the full output yet but this is a good sign. I already have other scripts to do things like get the list of active files in my project so a bit (see, filler word!) more scripting to feed pieces together and this can be integrated into nightly jobs to analyze my work.

If you’re interested check out the script, filter word list, and weasel word lists (word lists are blatantly stolen from a Scribophile post). I haven’t tested anywhere but my system, but there’s nothing fancy in the script so it should work on any Unixy system (probably OS X, likely Cygwin).

Update

I spent some more time and fixed a few of the rough edges. This required rewriting filer.sh to use perl, but it comes with quite a few advantages:

  • Word lists are only read once (they’re stored in a hash). This should provide faster performance when processing multiple input files (my expected use case once I integrate this script into my nightly jobs).
  • The output now includes line and column offsets.
  • The same word will be flagged each time it shows up on a line, not just once per file [e.g., A-00prologue.tex was (9:178), A-00prologue.tex was (9:277), A-00prologue.tex was (9:361)].
  • The previous two points should make it easy to add white-lists for automatic detection as work is edited. My current plan is to commit the script’s output and use diff to compare “current” output with “committed” output.