Writing Like a Geek

A few people have asked me for more details on my writing process so I figured the easiest thing is to document it here. This about the literal process of how I put together my projects.

Editor

Kate in action
Chapter 2 in kate

My main editor these days is Kate. Kate is a plain text editor but since it’s part of the KDE project it comes with a huge number of slick features include vi mode (vi is an unbelievably powerful editor once you make it past the learning curve). I use LaTeX markup for everything (more on that later).

Project Structure

Each project gets its own folder (current project is named “hammer” due to an old work working title). Each chapter is split in a couple files but how many depends on my mood when I worked on that particular piece (you can see in the above screenshot that the chapter is split in four files: one for each of the three scenes and a fourth to put them together). Files are named with a one-letter prefix based on the act (A, B, or C), a number (the chapter in that act), and a short name describing the point of that file. Each act has its own file that puts the chapters in order (named part-A.tex, part-B.tex, and part-C.tex).

Generating Output

LaTeX is a pain to work with directly (it pollutes the working directory and has no dependency management) but CMake solves both of these problems. CMake doesn’t support LaTeX out of the box, but I hacked together a module to manage LaTeX projects.

create_target(wolf-A "${CMAKE_CURRENT_SOURCE_DIR}" "${tex_files_A}" FALSE)
create_target(wolf-B "${CMAKE_CURRENT_SOURCE_DIR}" "${tex_files_B}" FALSE)
create_target(wolf-C "${CMAKE_CURRENT_SOURCE_DIR}" "${tex_files_C}" FALSE)
create_target(wolf   "${CMAKE_CURRENT_SOURCE_DIR}" "${tex_files}"   TRUE)

create_target(wolf-pub   "${CMAKE_CURRENT_SOURCE_DIR}" "${tex_files}" FALSE)
create_target(wolf-lulu  "${CMAKE_CURRENT_SOURCE_DIR}" "${tex_files}" FALSE)
create_target(wolf-scrib "${CMAKE_CURRENT_SOURCE_DIR}" "${tex_files}" FALSE)

The first four lines let me produce manuscript-formatted output for either the whole novel or individual acts, the bottom three produce typeset content, typeset content for a 6″ x 9″ page (per the requirements on Lulu), and individual chapters (for posting to Scribophile). The main targets I leverage are the first four, mostly because manuscript formats build much faster than non-manuscript formats.

Using LaTeX means I get lots of slick features when I generate the typeset output, including kerning, hyphenation, ligatures, old-style numerals, and micro-typography (this adjusts things like the spacing within words and overflowing into the margins to get whitespace that looks uniform to the human eye). Both typeset and manuscript forms leverage LaTeX features like the csquotes package (automatically manages quote matching across languages, including nested quotes).

Each target can be built in pdf, html, or odt. I can also generate “full” versions which include some post-processing on the output, but that’s only required for the full manuscript (required to generate Special Characters).

You can compare chapter 2 of The Howl of the Wolf in both manuscript and typeset formats.

Tools

I use apsell for spellchecking and integrate it with cmake using a custom command.

add_custom_target(spellcheck)
foreach   (file ${spellcheck_files})
    add_custom_command(TARGET spellcheck
                       COMMAND ${ASPELL} -t -p ${CMAKE_SOURCE_DIR}/aspell_dict check ${file})
endforeach(file)

If you don’t speak geek, this runs the entire project through aspell using a custom dictionary isolated to that project. My current plan is to delete the dictionary and run a clean spellcheck before publishing (just in case I accidentally added a word by mistake), but for now I can tolerate a few spelling errors.

I have a few scripts to help with analysis and file management. The most obvious example is word counts (splitting chapters/acts across lots of files makes this hard).

sh ../chapwc.sh 
A-00  3015+1+0 (1/0/0/0) Total
A-01  3688+0+0 (1/0/0/0) Total
A-02  2920+0+0 (1/0/0/0) Total
A-03  2471+0+0 (1/0/0/0) Total
A-04  2412+0+0 (1/0/0/0) Total
A-05  2718+0+0 (1/0/0/0) Total
A-06  1401+0+0 (1/0/0/0) Total
A-07  1117+0+0 (1/0/0/0) Total
...

All changes are tracked using git.

Posting

Before posting anything to Scribophile I run it through Pro Writing Aid to help with grammar and style. If you haven’t tried Pro Writing Aid give it a shot, the site is amazing (it found the exact same sentence being used twice in the same chapter; not similar, the exact same). I usually work through one report at a time, updating my raw files and generating new output as I go to make sure I don’t miss anything.

After going through Pro Writing Aid I check for troublesome words. When I’m done cleaning those up I do one last check through spellcheck and Pro Writing Aid to make sure I didn’t introduce new problems.

Once the sanity check is done I spit out the chapter in html (easier than dealing with copy/pasting through a pdf) and copy/paste in Scribophile.

Filter and Weasel Words

I learned new terms today: filter words and weasel words. In short:

  • Filter get in the way of prose and distance the reader from the action. They make it easy for the writer to tell the reader what’s going on instead of showing them. “The dog seemed agitated waiting for her owner,” instead of “The dog paced in front of the bay windows waiting for her owner.” “The pretty girl looked uninterested in the guy asking for her number,” instead of “The pretty girl ignored the guy asking for her number.”
  • Weasel words leave text feeling ambiguous. “He might be the hero’s brother.” “The body may have been stolen.” “The dog could have eaten the roast.” These sentences don’t help tell the story because the weasel words (in bold) render the text meaningless. “He might be unrelated to the hero.” “The body may be right where we left it.” “The dog could have ignored the roast and slept.”

Removing these cleans up the prose and makes it more interesting. A post on Scribophile suggested writing a macro in Word to detect these words and flag them but, since I don’t use Word, that doesn’t help me. Instead I implemented the same functionality in sh.

#!/bin/sh

CAT=/bin/cat
DIRNAME=/usr/bin/dirname
ECHO=/bin/echo
GREP=/bin/grep

wordLists="filter weasel"

if [ ${#} -gt 0 ]; then
    source=$(${DIRNAME} ${0})

    for file in ${@}; do
        for list in ${wordLists}; do
            IFS=$'\n'
            words=$(${CAT} "${source}/${list}-words.txt")
            for word in ${words}; do
                ${GREP} -q -i "\<${word}\>" ${file}
                if [ ${?} == 0 ]; then
                    ${ECHO} "${file} ${word}"
                fi
            done
        done
    done
else
    ${ECHO} "Usage: ${0} <filename1> [filename2 ...]"
fi

Bear in mind this is a quick and dirty attempt (it’s not as flexible as I’d like) but the basic functionality is there.

$ sh ../filter.sh A-00prologue.tex | head -n 5
A-00prologue.tex a bit
A-00prologue.tex could
A-00prologue.tex decided
A-00prologue.tex knew
A-00prologue.tex looked
$ sh ../filter.sh A-00prologue.tex | tail -n 5
A-00prologue.tex this
A-00prologue.tex that
A-00prologue.tex thought
A-00prologue.tex up
A-00prologue.tex wondered
$ sh ../filter.sh A-00prologue.tex | wc -l
48

I haven’t gone through the full output yet but this is a good sign. I already have other scripts to do things like get the list of active files in my project so a bit (see, filler word!) more scripting to feed pieces together and this can be integrated into nightly jobs to analyze my work.

If you’re interested check out the script, filter word list, and weasel word lists (word lists are blatantly stolen from a Scribophile post). I haven’t tested anywhere but my system, but there’s nothing fancy in the script so it should work on any Unixy system (probably OS X, likely Cygwin).

Update

I spent some more time and fixed a few of the rough edges. This required rewriting filer.sh to use perl, but it comes with quite a few advantages:

  • Word lists are only read once (they’re stored in a hash). This should provide faster performance when processing multiple input files (my expected use case once I integrate this script into my nightly jobs).
  • The output now includes line and column offsets.
  • The same word will be flagged each time it shows up on a line, not just once per file [e.g., A-00prologue.tex was (9:178), A-00prologue.tex was (9:277), A-00prologue.tex was (9:361)].
  • The previous two points should make it easy to add white-lists for automatic detection as work is edited. My current plan is to commit the script’s output and use diff to compare “current” output with “committed” output.

Normalization

[Note: Anna said it was fine to mention her and her work in this post]

I read a work a few days ago on Scribophile (“In Her Dreams” by Anna White) and one thing that struck me was the use of “fuck.” I’m not opposed to swearing, in fact I could probably make a sailor blush, but I do think swearing should be used in moderation.

Let’s take Deadwood as an example. The show averages 1.56 instances of “fuck” per minute (per Wikipedia). If you haven’t seen the show here’s a sample of its typical dialogue:

Al Swearengen: It’s not the fucking hour. It’s not the fucking vantage of the chair. It’s you, that’s changed the level of you suction somehow. That’s the fucking sum and substance of it.
Dolly: Maybe if I get on my knees?
Al Swearengen: You’re the cocksucker. Change the fucking angle.

Now Anna’s work isn’t nearly this gratuitous. And the creators of Deadwood intentionally gave their show the most foul language they could since with accurate language from the period, “[the characters] all wind up sounding like Yosemite Sam.” The problem however, is that when you have this much foul language in every conversation it loses the punch of hearing characters swear.

The reason I bring up Anna’s work (which, for the record, I genuinely enjoyed) is because its use of “fuck” is great. Each use is perfectly fine and sounds like things I’d actually say, but the last one is fantastic. My feedback suggested removing other other uses of the word so that the last version would stand out. Every prior use dilutes the impact that last “fuck” has and that impact is the highlight of her piece.

This applies to more than just language though. The reason Gus killing Victor in Breaking Bad was so shocking is because Gus hadn’t done anything like that since we met him over a season ago. If Gus had brutally murdered people from the moment we met him it’d be just another body. The reason, “I did it for me” stood out so much in the finale is because before that Walt always said he was doing everything, from the meth to the murders, for the family.

This works in politics too. Opencarry.org promotes gun owners openly carrying their firearms (as opposed to carrying concealed) specifically because they want to normalize firearms in everyday life. If you see a guy with a gun at the grocery store guns become as mundane and boring as seeing somebody with a cell phone.

The more you use something, anything, the more it’s normalized. When writing the killer scene that you want to stick with a reader you need to make it unique not just in terms of content, but in terms of how you present it. The verbiage, the pacing, the scenery, the dialogue, and everything else about those scenes matters.

So by all means, use “fuck” as much as you want in your work. Just remember when you actually need that “holy fuck” moment you may have blown the chance to really shock your reader.

Doing Diversity Right

Let’s talk about one of my absolute favorite shows: BoJack Horseman. If you haven’t watched it pull it up on Netflix and watch it. All of it. Seriously, this post is going to be filled with spoilers. I’ll wait.

I bring up BoJack Horseman because it’s one of the shining examples when it comes to diversity. Now that you’ve watched all thirty-six episodes you’re aware that the show has a diverse cast of characters. Some of the main characters are animals (Todd and Diane being the only main characters who aren’t animals) but we also have gay characters (Karen and Tanisha’s wedding in season 3), Vietnamese (Diane), black (Corduroy), and even an asexual (Todd).

In most shows these would be major defining characteristics of these characters but In BoJack none of these surface-level things matter. For example, fans speculated for three seasons over Todd’s sexual orientation before the show confirmed he was asexual. However, the only time the show ever explicitly touched on Todd’s sexuality was a throwaway line in the first episode when BoJack says he thought Todd’s parents kicked him out for being gay (as opposed to the actual reason, kicking him out for being a loser). Characters don’t question Todd or try to see who he has sex with and Todd doesn’t spend time playing the victim because of his orientation; nobody in the show really cares. Instead, the show gives Todd actual character development (mostly as a victim of BoJack’s selfishness but he’s had significant arcs in all three seasons) and he gets the sole use of “fuck” in season 3 (since you’ve seen the show you know each “fuck” is a big deal).  Even when they finally reveal Todd’s asexual it comes out without any flair or dramatic music (okay, there’s dramatic music but that’s because it’s the end of the season and there’s a lot going on).

Not only does that style make Todd more interesting as a character, it’s a way to actually show diversity in a respectful way. If every Todd story was about his sexuality it’d get boring but instead he’s a fan favorite. His rock opera was awesome and who didn’t laugh when Cabracadabra started accepting male customers (the entire purpose of the company was a ride-sharing app that protects women from pervs).

It’s not just Todd though. The only reference to Diane’s Vietnamese ancestry is BoJack’s inability to pronounce “Nguyen.” Half the characters are animals and Princess Carolyn being is a pink cat is the least interesting thing about her. Mr. Peanutbutter starts as a stereotypical Golden Labrador but he reveals deep depression and jealousy as we get to know him better (seriously, watch the premier episode of Hollywoo Stars and Celebrities: What Do They Know? Do They Know Things?? Let’s Find Out! and tell me he’s not a deep character). Every character matters, even throwaway ones like The Closer.

The characters aren’t defined by obvious stereotypes and tropes that proliferate through shows/books/movies that come out today where they make sure the main cast includes every diversity group the creators can think of.

Now if you’re writing a story about a plantation in South Carolina set in 1800 it’s obviously fine to focus on the fact racism is rampant and show being black is a big deal. Even still, make sure the characters have actual arcs and aren’t just tokens to hit a checkbox in diversity bingo. Diversity bingo not only cheapens your work, it cheapens the very people you’re trying to represent.

Version Control for Fun and Profit

Keeping track of old versions of your work is one of those things you don’t appreciate until you have it. Most people keep old versions with a mishmash of file names (novel.doc, novel-backup.doc, novel-backup2.doc, novel-backup-september-2015.doc, …) or using a service like Google Drive or Dropbox. These solutions work but to say they’re lacking is an understatement. Enter git.

Git is a piece of software used to track history of source code. While it’s fine-tuned for this behavior it’s still phenomenal at tracking changes in projects that are structured similar to software. If your project has the following properties git makes an excellent choice:

  • work is stored in plain text (i.e., non-binary files)
  • files are small to medium-sized
  • files are human-readable

Since I use LaTeX all these conditions are met (most word processors have an option to store contents in flat files which gets you most of the way here).

So how useful is git? Think about the backups spread across different services and folders with God-only-knows what names. Count how many you have and compare it to the history of my project.

$ git log --oneline | wc -l
177

177 versions at the time of this post, all of which are time-stamped with metadata about what actually changed. The last two commits look like this:

commit 42bf73d45dfe7fea13996ef9e3b2cdd0e122dc0a
Author: K.P. Wayne
Date:   Sun Jan 29 20:02:52 2017 -0700

    Polish up first part of chapter 1 using Pro Writing Aid

commit 297b676edc97643f901be6f6d0f846546534f2d5
Author: K.P. Wayne
Date:   Sun Jan 29 18:54:26 2017 -0700

    Work in feedback from Scribophile

Not only is backing up offsite trivial, but each backup contains the entire history of the project. As of right now my entire repository, meaning the full history of my project, is a whopping 652 kilobytes; less than half the space of a floppy disk.

You can also mark specific versions with symbolic names (the git term for this is a tag). This makes it trivial to see what changed between two different versions.

git diff –color-words scrib-bk1-a01..master bk1/A-01mine.tex

In fairness git has a learning curve and last time I checked most gui tools ranged from awful to bad (coworkers assure me this has changed). A simpler option is mercurial and Joel Spolsky put together a simple tutorial for anybody unfamiliar with version control.

It’s not for everybody and I haven’t tested git with popular tools Scrivener but if it fits in your workflow give it a go.

 

Special Characters

Section 11.12 of the The Chicago Manual of Style’s sixteenth edition recommends including a list of special characters at the end of any manuscript (a special character generally being anything not found on a standard keyboard). Because I’m lazy I want something to do the work for me so I don’t have to track what characters I’m using through revisions. Let’s make LaTeX track the special characters we use.

Since the glossaries package supports multiple glossaries we can use a special one just to track our special characters. Update the premable with something like this:

\newglossary[spg]{special}{sps}{spo}{Special Characters}

Now we can add any special characters specifically to this glossary with a few extra fields filled out:

\newglossaryentry{e-acute}{
  name = \'{e},
  description = {e with acute [U+00E9]},
  type = special,
  sort = eacute
}

Notice the description and type fields? These provide information about the symbol (in this case, é) including it’s Unicode representation and they tell LaTeX to put it in the new glossary we created in the last step. These fields are critical to get the behavior we want so no skipping steps.

Now we can start putting our \gls{e-acute} tags wherever we want but that’s pretty hacky. Instead let’s add another glossary entry that does the work for us:

\newglossaryentry{cafe}{
  name = {caf\gls{e-acute}},
  description = { }
}

Now we can write like normal and wherever we put \gls{cafe} we get a nicely formatted “café.” The last step is to actually print the special characters we’re using at the end of our document:

\printglossary[type=special]
Special character listing

Not only can we avoid manually keeping track of what characters we use while editing, we even include page numbers where our special characters appear.

Naming Characters (and Places, Groups, Gods…)

I’m awful with names. Actually that undersells how bad I am. I’m the kind of person who likes things to be precise and correct from the beginning (engineering hat) so I don’t even like having placeholders and calling my characters Bob, Janet, and Tony. I’ve tried, really, but I keep fidgeting and will spend hours trying to come up with the perfect name. Plus even if I somehow move on find/replace can only do so much. If I screw up and talk about how Bbo and Tony are trying to one-up each other to take Janet on a date we all know what’s going to happen.

The solution: placeholders. Yeah, even though I hate them they’re still the best option. Let’s look at a practical example.

\newglossaryentry{first-guy}{
  name = {Bob},
  description = { }
}
\newglossaryentry{second-guy}{
  name = {Tony},
  description = { }
}
\newglossaryentry{girl}{
  name = {Janet},
  description = { }
}
\gls{first-guy} and \gls{second-guy} both have a crush
on \gls{girl}.  I'd tell you who gets her in the end but
 I haven't actually thought that far ahead.

That’s from a stupid LaTeX file I just wrote that spits out the following: “Bob and Tony both have a crush on Janet. I’d tell you who gets her in the end but I haven’t actually thought that far ahead.” Notice how their names only appear in one place each, where I define the glossary entries? This means I can come in later, change Bob to Brad, and after I process the file again I end up with: “Brad and Tony both have a crush on Janet. I’d tell you who gets her in the end but I haven’t actually thought that far ahead.”

Now I can pick names that are good enough (mostly ones I blatantly pilfer from video games) and I don’t have to worry about find/replace letting me down when I go through later with better names. It’s already helped me once when I realized there were two businesses with “Irving” in their name (courtesy of Lloyd Irving from Tales of Symphonia) so all I had to do was update one entry in my glossary files and my name duplication issue went away.