Filter and Weasel Words

I learned new terms today: fil­ter words and weasel words. In short:

  • Fil­ter get in the way of prose and dis­tance the read­er from the action. They make it easy for the writer to tell the read­er what’s going on instead of show­ing them. “The dog seemed agi­tat­ed wait­ing for her own­er,” instead of “The dog paced in front of the bay win­dows wait­ing for her own­er.” “The pret­ty girl looked unin­ter­est­ed in the guy ask­ing for her num­ber,” instead of “The pret­ty girl ignored the guy ask­ing for her number.”
  • Weasel words leave text feel­ing ambigu­ous. “He might be the hero’s broth­er.” “The body may have been stolen.” “The dog could have eat­en the roast.” These sen­tences don’t help tell the sto­ry because the weasel words (in bold) ren­der the text mean­ing­less. “He might be unre­lat­ed to the hero.” “The body may be right where we left it.” “The dog could have ignored the roast and slept.”

Remov­ing these cleans up the prose and makes it more inter­est­ing. A post on Scri­bophile sug­gest­ed writ­ing a macro in Word to detect these words and flag them but, since I don’t use Word, that doesn’t help me. Instead I imple­ment­ed the same func­tion­al­i­ty in sh. 

#!/bin/sh

CAT=/bin/cat
DIRNAME=/usr/bin/dirname
ECHO=/bin/echo
GREP=/bin/grep

wordLists="filter weasel"

if [ ${#} -gt 0 ]; then
    source=$(${DIRNAME} ${0})

    for file in ${@}; do
        for list in ${wordLists}; do
            IFS=
Bear in mind this is a quick and dirty attempt (it’s not as flexible as I’d like) but the basic functionality is there.
$ sh ../filter.sh A-00prologue.tex | head -n 5
A-00prologue.tex a bit
A-00prologue.tex could
A-00prologue.tex decided
A-00prologue.tex knew
A-00prologue.tex looked
$ sh ../filter.sh A-00prologue.tex | tail -n 5
A-00prologue.tex this
A-00prologue.tex that
A-00prologue.tex thought
A-00prologue.tex up
A-00prologue.tex wondered
$ sh ../filter.sh A-00prologue.tex | wc -l
48
I haven’t gone through the full output yet but this is a good sign. I already have other scripts to do things like get the list of active files in my project so a bit (see, filler word!) more scripting to feed pieces together and this can be integrated into nightly jobs to analyze my work.
If you’re interested check out the script, filter word list, and weasel word lists (word lists are blatantly stolen from a Scribophile post). I haven’t tested anywhere but my system, but there’s nothing fancy in the script so it should work on any Unixy system (probably OS X, likely Cygwin).
Update
I spent some more time and fixed a few of the rough edges. This required rewriting filer.sh to use perl, but it comes with quite a few advantages:

Word lists are only read once (they’re stored in a hash). This should provide faster performance when processing multiple input files (my expected use case once I integrate this script into my nightly jobs).
The output now includes line and column offsets.
The same word will be flagged each time it shows up on a line, not just once per file [e.g., A-00prologue.tex was (9:178), A-00prologue.tex was (9:277), A-00prologue.tex was (9:361)].
The previous two points should make it easy to add white-lists for automatic detection as work is edited. My current plan is to commit the script’s output and use diff to compare “current” output with “committed” output.

\n'
            words=$(${CAT} "${source}/${list}-words.txt")
            for word in ${words}; do
                ${GREP} -q -i "\<${word}\>" ${file}
                if [ ${?} == 0 ]; then
                    ${ECHO} "${file} ${word}"
                fi
            done
        done
    done
else
    ${ECHO} "Usage: ${0} <filename1> [filename2 ...]"
fi

Bear in mind this is a quick and dirty attempt (it’s not as flex­i­ble as I’d like) but the basic func­tion­al­i­ty is there. 

I haven’t gone through the full out­put yet but this is a good sign. I already have oth­er scripts to do things like get the list of active files in my project so a bit (see, filler word!) more script­ing to feed pieces togeth­er and this can be inte­grat­ed into night­ly jobs to ana­lyze my work.

If you’re inter­est­ed check out the script, fil­ter word list, and weasel word lists (word lists are bla­tant­ly stolen from a Scri­bophile post). I haven’t test­ed any­where but my sys­tem, but there’s noth­ing fan­cy in the script so it should work on any Unixy sys­tem (prob­a­bly OS X, like­ly Cygwin).

Update

I spent some more time and fixed a few of the rough edges. This required rewrit­ing filer.sh to use perl, but it comes with quite a few advantages:

  • Word lists are only read once (they’re stored in a hash). This should pro­vide faster per­for­mance when pro­cess­ing mul­ti­ple input files (my expect­ed use case once I inte­grate this script into my night­ly jobs).
  • The out­put now includes line and col­umn offsets.
  • The same word will be flagged each time it shows up on a line, not just once per file [e.g., A‑00prologue.tex was (9:178), A‑00prologue.tex was (9:277), A‑00prologue.tex was (9:361)].
  • The pre­vi­ous two points should make it easy to add white-lists for auto­mat­ic detec­tion as work is edit­ed. My cur­rent plan is to com­mit the script’s out­put and use diff to com­pare “cur­rent” out­put with “com­mit­ted” output.

2 thoughts on “Filter and Weasel Words

  1. Libre­Of­fice exten­sion, just sayin’!

    Thanks for the tips — this is inter­est­ing stuff to con­sid­er. I’m sure the util would flag my draft stuff six ways to Sunday.

Comments are closed.