Filter and Weasel Words

I learned new terms today: fil­ter words and weasel words. In short:

  • Fil­ter get in the way of prose and dis­tance the read­er from the action. They make it easy for the writer to tell the read­er what’s going on instead of show­ing them. “The dog seemed agi­tat­ed wait­ing for her own­er,” instead of “The dog paced in front of the bay win­dows wait­ing for her own­er.” “The pret­ty girl looked unin­ter­est­ed in the guy ask­ing for her num­ber,” instead of “The pret­ty girl ignored the guy ask­ing for her num­ber.”
  • Weasel words leave text feel­ing ambigu­ous. “He might be the hero’s broth­er.” “The body may have been stolen.” “The dog could have eat­en the roast.” These sen­tences don’t help tell the sto­ry because the weasel words (in bold) ren­der the text mean­ing­less. “He might be unre­lat­ed to the hero.” “The body may be right where we left it.” “The dog could have ignored the roast and slept.”

Remov­ing these cleans up the prose and makes it more inter­est­ing. A post on Scri­bophile sug­gest­ed writ­ing a macro in Word to detect these words and flag them but, since I don’t use Word, that doesn’t help me. Instead I imple­ment­ed the same func­tion­al­i­ty in sh.

#!/bin/sh

CAT=/bin/cat
DIRNAME=/usr/bin/dirname
ECHO=/bin/echo
GREP=/bin/grep

wordLists="filter weasel"

if [ ${#} -gt 0 ]; then
    source=$(${DIRNAME} ${0})

    for file in ${@}; do
        for list in ${wordLists}; do
            IFS=$'\n'
            words=$(${CAT} "${source}/${list}-words.txt")
            for word in ${words}; do
                ${GREP} -q -i "\<${word}\>" ${file}
                if [ ${?} == 0 ]; then
                    ${ECHO} "${file} ${word}"
                fi
            done
        done
    done
else
    ${ECHO} "Usage: ${0} <filename1> [filename2 ...]"
fi

Bear in mind this is a quick and dirty attempt (it’s not as flex­i­ble as I’d like) but the basic func­tion­al­i­ty is there.

$ sh ../filter.sh A-00prologue.tex | head -n 5
A-00prologue.tex a bit
A-00prologue.tex could
A-00prologue.tex decided
A-00prologue.tex knew
A-00prologue.tex looked
$ sh ../filter.sh A-00prologue.tex | tail -n 5
A-00prologue.tex this
A-00prologue.tex that
A-00prologue.tex thought
A-00prologue.tex up
A-00prologue.tex wondered
$ sh ../filter.sh A-00prologue.tex | wc -l
48

I haven’t gone through the full out­put yet but this is a good sign. I already have oth­er scripts to do things like get the list of active files in my project so a bit (see, filler word!) more script­ing to feed pieces togeth­er and this can be inte­grat­ed into night­ly jobs to ana­lyze my work.

If you’re inter­est­ed check out the script, fil­ter word list, and weasel word lists (word lists are bla­tant­ly stolen from a Scri­bophile post). I haven’t test­ed any­where but my sys­tem, but there’s noth­ing fan­cy in the script so it should work on any Unixy sys­tem (prob­a­bly OS X, like­ly Cyg­win).

Update

I spent some more time and fixed a few of the rough edges. This required rewrit­ing filer.sh to use perl, but it comes with quite a few advan­tages:

  • Word lists are only read once (they’re stored in a hash). This should pro­vide faster per­for­mance when pro­cess­ing mul­ti­ple input files (my expect­ed use case once I inte­grate this script into my night­ly jobs).
  • The out­put now includes line and col­umn off­sets.
  • The same word will be flagged each time it shows up on a line, not just once per file [e.g., A-00prologue.tex was (9:178), A-00prologue.tex was (9:277), A-00prologue.tex was (9:361)].
  • The pre­vi­ous two points should make it easy to add white-lists for auto­mat­ic detec­tion as work is edit­ed. My cur­rent plan is to com­mit the script’s out­put and use diff to com­pare “cur­rent” out­put with “com­mit­ted” out­put.

One thought on “Filter and Weasel Words

  1. Libre­Of­fice exten­sion, just sayin’!

    Thanks for the tips — this is inter­est­ing stuff to con­sid­er. I’m sure the util would flag my draft stuff six ways to Sun­day.

Leave a Reply

Your email address will not be published. Required fields are marked *