Writing Like a Geek

A few peo­ple have asked me for more details on my writ­ing process so I fig­ured the eas­i­est thing is to doc­u­ment it here. This about the lit­er­al process of how I put togeth­er my projects.

Editor

Kate in action
Chap­ter 2 in kate

My main edi­tor these days is Kate. Kate is a plain text edi­tor but since it’s part of the KDE project it comes with a huge num­ber of slick fea­tures include vi mode (vi is an unbe­liev­ably pow­er­ful edi­tor once you make it past the learn­ing curve). I use LaTeX markup for every­thing (more on that lat­er).

Project Structure

Each project gets its own fold­er (cur­rent project is named “ham­mer” due to an old work work­ing title). Each chap­ter is split in a cou­ple files but how many depends on my mood when I worked on that par­tic­u­lar piece (you can see in the above screen­shot that the chap­ter is split in four files: one for each of the three scenes and a fourth to put them togeth­er). Files are named with a one-let­ter pre­fix based on the act (A, B, or C), a num­ber (the chap­ter in that act), and a short name describ­ing the point of that file. Each act has its own file that puts the chap­ters in order (named part-A.tex, part-B.tex, and part-C.tex).

Generating Output

LaTeX is a pain to work with direct­ly (it pol­lutes the work­ing direc­to­ry and has no depen­den­cy man­age­ment) but CMake solves both of these prob­lems. CMake doesn’t sup­port LaTeX out of the box, but I hacked togeth­er a mod­ule to man­age LaTeX projects.

create_target(wolf-A "${CMAKE_CURRENT_SOURCE_DIR}" "${tex_files_A}" FALSE)
create_target(wolf-B "${CMAKE_CURRENT_SOURCE_DIR}" "${tex_files_B}" FALSE)
create_target(wolf-C "${CMAKE_CURRENT_SOURCE_DIR}" "${tex_files_C}" FALSE)
create_target(wolf   "${CMAKE_CURRENT_SOURCE_DIR}" "${tex_files}"   TRUE)

create_target(wolf-pub   "${CMAKE_CURRENT_SOURCE_DIR}" "${tex_files}" FALSE)
create_target(wolf-lulu  "${CMAKE_CURRENT_SOURCE_DIR}" "${tex_files}" FALSE)
create_target(wolf-scrib "${CMAKE_CURRENT_SOURCE_DIR}" "${tex_files}" FALSE)

The first four lines let me pro­duce man­u­script-for­mat­ted out­put for either the whole nov­el or indi­vid­ual acts, the bot­tom three pro­duce type­set con­tent, type­set con­tent for a 6″ x 9″ page (per the require­ments on Lulu), and indi­vid­ual chap­ters (for post­ing to Scri­bophile). The main tar­gets I lever­age are the first four, most­ly because man­u­script for­mats build much faster than non-man­u­script for­mats.

Using LaTeX means I get lots of slick fea­tures when I gen­er­ate the type­set out­put, includ­ing kern­ing, hyphen­ation, lig­a­tures, old-style numer­als, and micro-typog­ra­phy (this adjusts things like the spac­ing with­in words and over­flow­ing into the mar­gins to get white­space that looks uni­form to the human eye). Both type­set and man­u­script forms lever­age LaTeX fea­tures like the csquotes pack­age (auto­mat­i­cal­ly man­ages quote match­ing across lan­guages, includ­ing nest­ed quotes).

Each tar­get can be built in pdf, html, or odt. I can also gen­er­ate “full” ver­sions which include some post-pro­cess­ing on the out­put, but that’s only required for the full man­u­script (required to gen­er­ate Spe­cial Char­ac­ters).

You can com­pare chap­ter 2 of The Howl of the Wolf in both man­u­script and type­set for­mats.

Tools

I use apsell for spellcheck­ing and inte­grate it with cmake using a cus­tom com­mand.

add_custom_target(spellcheck)
foreach   (file ${spellcheck_files})
    add_custom_command(TARGET spellcheck
                       COMMAND ${ASPELL} -t -p ${CMAKE_SOURCE_DIR}/aspell_dict check ${file})
endforeach(file)

If you don’t speak geek, this runs the entire project through aspell using a cus­tom dic­tio­nary iso­lat­ed to that project. My cur­rent plan is to delete the dic­tio­nary and run a clean spellcheck before pub­lish­ing (just in case I acci­den­tal­ly added a word by mis­take), but for now I can tol­er­ate a few spelling errors.

I have a few scripts to help with analy­sis and file man­age­ment. The most obvi­ous exam­ple is word counts (split­ting chapters/acts across lots of files makes this hard).

sh ../chapwc.sh 
A-00  3015+1+0 (1/0/0/0) Total
A-01  3688+0+0 (1/0/0/0) Total
A-02  2920+0+0 (1/0/0/0) Total
A-03  2471+0+0 (1/0/0/0) Total
A-04  2412+0+0 (1/0/0/0) Total
A-05  2718+0+0 (1/0/0/0) Total
A-06  1401+0+0 (1/0/0/0) Total
A-07  1117+0+0 (1/0/0/0) Total
...

All changes are tracked using git.

Posting

Before post­ing any­thing to Scri­bophile I run it through Pro Writ­ing Aid to help with gram­mar and style. If you haven’t tried Pro Writ­ing Aid give it a shot, the site is amaz­ing (it found the exact same sen­tence being used twice in the same chap­ter; not sim­i­lar, the exact same). I usu­al­ly work through one report at a time, updat­ing my raw files and gen­er­at­ing new out­put as I go to make sure I don’t miss any­thing.

After going through Pro Writ­ing Aid I check for trou­ble­some words. When I’m done clean­ing those up I do one last check through spellcheck and Pro Writ­ing Aid to make sure I didn’t intro­duce new prob­lems.

Once the san­i­ty check is done I spit out the chap­ter in html (eas­i­er than deal­ing with copy/pasting through a pdf) and copy/paste in Scri­bophile.

Filter and Weasel Words

I learned new terms today: fil­ter words and weasel words. In short:

  • Fil­ter get in the way of prose and dis­tance the read­er from the action. They make it easy for the writer to tell the read­er what’s going on instead of show­ing them. “The dog seemed agi­tat­ed wait­ing for her own­er,” instead of “The dog paced in front of the bay win­dows wait­ing for her own­er.” “The pret­ty girl looked unin­ter­est­ed in the guy ask­ing for her num­ber,” instead of “The pret­ty girl ignored the guy ask­ing for her num­ber.”
  • Weasel words leave text feel­ing ambigu­ous. “He might be the hero’s broth­er.” “The body may have been stolen.” “The dog could have eat­en the roast.” These sen­tences don’t help tell the sto­ry because the weasel words (in bold) ren­der the text mean­ing­less. “He might be unre­lat­ed to the hero.” “The body may be right where we left it.” “The dog could have ignored the roast and slept.”

Remov­ing these cleans up the prose and makes it more inter­est­ing. A post on Scri­bophile sug­gest­ed writ­ing a macro in Word to detect these words and flag them but, since I don’t use Word, that doesn’t help me. Instead I imple­ment­ed the same func­tion­al­i­ty in sh.

#!/bin/sh

CAT=/bin/cat
DIRNAME=/usr/bin/dirname
ECHO=/bin/echo
GREP=/bin/grep

wordLists="filter weasel"

if [ ${#} -gt 0 ]; then
    source=$(${DIRNAME} ${0})

    for file in ${@}; do
        for list in ${wordLists}; do
            IFS=$'\n'
            words=$(${CAT} "${source}/${list}-words.txt")
            for word in ${words}; do
                ${GREP} -q -i "\<${word}\>" ${file}
                if [ ${?} == 0 ]; then
                    ${ECHO} "${file} ${word}"
                fi
            done
        done
    done
else
    ${ECHO} "Usage: ${0} <filename1> [filename2 ...]"
fi

Bear in mind this is a quick and dirty attempt (it’s not as flex­i­ble as I’d like) but the basic func­tion­al­i­ty is there.

$ sh ../filter.sh A-00prologue.tex | head -n 5
A-00prologue.tex a bit
A-00prologue.tex could
A-00prologue.tex decided
A-00prologue.tex knew
A-00prologue.tex looked
$ sh ../filter.sh A-00prologue.tex | tail -n 5
A-00prologue.tex this
A-00prologue.tex that
A-00prologue.tex thought
A-00prologue.tex up
A-00prologue.tex wondered
$ sh ../filter.sh A-00prologue.tex | wc -l
48

I haven’t gone through the full out­put yet but this is a good sign. I already have oth­er scripts to do things like get the list of active files in my project so a bit (see, filler word!) more script­ing to feed pieces togeth­er and this can be inte­grat­ed into night­ly jobs to ana­lyze my work.

If you’re inter­est­ed check out the script, fil­ter word list, and weasel word lists (word lists are bla­tant­ly stolen from a Scri­bophile post). I haven’t test­ed any­where but my sys­tem, but there’s noth­ing fan­cy in the script so it should work on any Unixy sys­tem (prob­a­bly OS X, like­ly Cyg­win).

Update

I spent some more time and fixed a few of the rough edges. This required rewrit­ing filer.sh to use perl, but it comes with quite a few advan­tages:

  • Word lists are only read once (they’re stored in a hash). This should pro­vide faster per­for­mance when pro­cess­ing mul­ti­ple input files (my expect­ed use case once I inte­grate this script into my night­ly jobs).
  • The out­put now includes line and col­umn off­sets.
  • The same word will be flagged each time it shows up on a line, not just once per file [e.g., A-00prologue.tex was (9:178), A-00prologue.tex was (9:277), A-00prologue.tex was (9:361)].
  • The pre­vi­ous two points should make it easy to add white-lists for auto­mat­ic detec­tion as work is edit­ed. My cur­rent plan is to com­mit the script’s out­put and use diff to com­pare “cur­rent” out­put with “com­mit­ted” out­put.

Normalization

[Note: Anna said it was fine to men­tion her and her work in this post]

I read a work a few days ago on Scri­bophile (“In Her Dreams” by Anna White) and one thing that struck me was the use of “fuck.” I’m not opposed to swear­ing, in fact I could prob­a­bly make a sailor blush, but I do think swear­ing should be used in mod­er­a­tion.

Let’s take Dead­wood as an exam­ple. The show aver­ages 1.56 instances of “fuck” per minute (per Wikipedia). If you haven’t seen the show here’s a sam­ple of its typ­i­cal dia­logue:

Al Swearen­gen: It’s not the fuck­ing hour. It’s not the fuck­ing van­tage of the chair. It’s you, that’s changed the lev­el of you suc­tion some­how. That’s the fuck­ing sum and sub­stance of it.
Dol­ly: Maybe if I get on my knees?
Al Swearen­gen: You’re the cock­suck­er. Change the fuck­ing angle.

Now Anna’s work isn’t near­ly this gra­tu­itous. And the cre­ators of Dead­wood inten­tion­al­ly gave their show the most foul lan­guage they could since with accu­rate lan­guage from the peri­od, “[the char­ac­ters] all wind up sound­ing like Yosemite Sam.” The prob­lem how­ev­er, is that when you have this much foul lan­guage in every con­ver­sa­tion it los­es the punch of hear­ing char­ac­ters swear.

The rea­son I bring up Anna’s work (which, for the record, I gen­uine­ly enjoyed) is because its use of “fuck” is great. Each use is per­fect­ly fine and sounds like things I’d actu­al­ly say, but the last one is fan­tas­tic. My feed­back sug­gest­ed remov­ing oth­er oth­er uses of the word so that the last ver­sion would stand out. Every pri­or use dilutes the impact that last “fuck” has and that impact is the high­light of her piece.

This applies to more than just lan­guage though. The rea­son Gus killing Vic­tor in Break­ing Bad was so shock­ing is because Gus hadn’t done any­thing like that since we met him over a sea­son ago. If Gus had bru­tal­ly mur­dered peo­ple from the moment we met him it’d be just anoth­er body. The rea­son, “I did it for me” stood out so much in the finale is because before that Walt always said he was doing every­thing, from the meth to the mur­ders, for the fam­i­ly.

This works in pol­i­tics too. Opencarry.org pro­motes gun own­ers open­ly car­ry­ing their firearms (as opposed to car­ry­ing con­cealed) specif­i­cal­ly because they want to nor­mal­ize firearms in every­day life. If you see a guy with a gun at the gro­cery store, guns become as mun­dane and bor­ing as see­ing some­body with a cell phone.

The more you use some­thing–any­thing–the more it’s nor­mal­ized. When writ­ing the killer scene that you want to stick with a read­er, you need to make it unique not just in terms of con­tent, but in terms of how you present it. The ver­biage, the pac­ing, the scenery, the dia­logue, and every­thing else about those scenes mat­ters.

So by all means, use “fuck” as much as you want in your work. Just remem­ber when you actu­al­ly need that “holy fuck” moment you may have blown the chance to real­ly shock your read­er.

Doing Diversity Right

Let’s talk about one of my absolute favorite shows: BoJack Horse­man. If you haven’t watched it pull it up on Net­flix and watch it. All of it. Seri­ous­ly, this post is going to be filled with spoil­ers. I’ll wait.

I bring up BoJack Horse­man because it’s one of the shin­ing exam­ples when it comes to diver­si­ty. Now that you’ve watched all thir­ty-six episodes you’re aware that the show has a diverse cast of char­ac­ters. Some of the main char­ac­ters are ani­mals (Todd and Diane being the only main char­ac­ters who aren’t ani­mals) but we also have gay char­ac­ters (Karen and Tanisha’s wed­ding in sea­son 3), Viet­namese (Diane), black (Cor­duroy), and even an asex­u­al (Todd).

In most shows these would be major defin­ing char­ac­ter­is­tics of these char­ac­ters but In BoJack none of these sur­face-lev­el things mat­ter. For exam­ple, fans spec­u­lat­ed for three sea­sons over Todd’s sex­u­al ori­en­ta­tion before the show con­firmed he was asex­u­al. How­ev­er, the only time the show ever explic­it­ly touched on Todd’s sex­u­al­i­ty was a throw­away line in the first episode when BoJack says he thought Todd’s par­ents kicked him out for being gay (as opposed to the actu­al rea­son, kick­ing him out for being a los­er). Char­ac­ters don’t ques­tion Todd or try to see who he has sex with and Todd doesn’t spend time play­ing the vic­tim because of his ori­en­ta­tion; nobody in the show real­ly cares. Instead, the show gives Todd actu­al char­ac­ter devel­op­ment (most­ly as a vic­tim of BoJack’s self­ish­ness but he’s had sig­nif­i­cant arcs in all three sea­sons) and he gets the sole use of “fuck” in sea­son 3 (since you’ve seen the show you know each “fuck” is a big deal).  Even when they final­ly reveal Todd’s asex­u­al it comes out with­out any flair or dra­mat­ic music (okay, there’s dra­mat­ic music but that’s because it’s the end of the sea­son and there’s a lot going on).

Not only does that style make Todd more inter­est­ing as a char­ac­ter, it’s a way to actu­al­ly show diver­si­ty in a respect­ful way. If every Todd sto­ry was about his sex­u­al­i­ty it’d get bor­ing but instead he’s a fan favorite. His rock opera was awe­some and who didn’t laugh when Cabra­cadabra start­ed accept­ing male cus­tomers (the entire pur­pose of the com­pa­ny was a ride-shar­ing app that pro­tects women from per­vs).

It’s not just Todd though. The only ref­er­ence to Diane’s Viet­namese ances­try is BoJack’s inabil­i­ty to pro­nounce “Nguyen.” Half the char­ac­ters are ani­mals and Princess Car­olyn being is a pink cat is the least inter­est­ing thing about her. Mr. Peanut­but­ter starts as a stereo­typ­i­cal Gold­en Labrador but he reveals deep depres­sion and jeal­ousy as we get to know him bet­ter (seri­ous­ly, watch the pre­mier episode of Hol­ly­woo Stars and Celebri­ties: What Do They Know? Do They Know Things?? Let’s Find Out! and tell me he’s not a deep char­ac­ter). Every char­ac­ter mat­ters, even throw­away ones like The Clos­er.

The char­ac­ters aren’t defined by obvi­ous stereo­types and tropes that pro­lif­er­ate through shows/books/movies that come out today where they make sure the main cast includes every diver­si­ty group the cre­ators can think of.

Now if you’re writ­ing a sto­ry about a plan­ta­tion in South Car­oli­na set in 1800 it’s obvi­ous­ly fine to focus on the fact racism is ram­pant and show being black is a big deal. Even still, make sure the char­ac­ters have actu­al arcs and aren’t just tokens to hit a check­box in diver­si­ty bin­go. Diver­si­ty bin­go not only cheap­ens your work, it cheap­ens the very peo­ple you’re try­ing to rep­re­sent.

Version Control for Fun and Profit

Keep­ing track of old ver­sions of your work is one of those things you don’t appre­ci­ate until you have it. Most peo­ple keep old ver­sions with a mish­mash of file names (novel.doc, novel-backup.doc, novel-backup2.doc, novel-backup-september-2015.doc, …) or using a ser­vice like Google Dri­ve or Drop­box. These solu­tions work but to say they’re lack­ing is an under­state­ment. Enter git.

Git is a piece of soft­ware used to track his­to­ry of source code. While it’s fine-tuned for this behav­ior it’s still phe­nom­e­nal at track­ing changes in projects that are struc­tured sim­i­lar to soft­ware. If your project has the fol­low­ing prop­er­ties git makes an excel­lent choice:

  • work is stored in plain text (i.e., non-bina­ry files)
  • files are small to medi­um-sized
  • files are human-read­able

Since I use LaTeX all these con­di­tions are met (most word proces­sors have an option to store con­tents in flat files which gets you most of the way here).

So how use­ful is git? Think about the back­ups spread across dif­fer­ent ser­vices and fold­ers with God-only-knows what names. Count how many you have and com­pare it to the his­to­ry of my project.

$ git log --oneline | wc -l
177

177 ver­sions at the time of this post, all of which are time-stamped with meta­da­ta about what actu­al­ly changed. The last two com­mits look like this:

commit 42bf73d45dfe7fea13996ef9e3b2cdd0e122dc0a
Author: K.P. Wayne
Date:   Sun Jan 29 20:02:52 2017 -0700

    Polish up first part of chapter 1 using Pro Writing Aid

commit 297b676edc97643f901be6f6d0f846546534f2d5
Author: K.P. Wayne
Date:   Sun Jan 29 18:54:26 2017 -0700

    Work in feedback from Scribophile

Not only is back­ing up off­site triv­ial, but each back­up con­tains the entire his­to­ry of the project. As of right now my entire repos­i­to­ry, mean­ing the full his­to­ry of my project, is a whop­ping 652 kilo­bytes; less than half the space of a flop­py disk.

You can also mark spe­cif­ic ver­sions with sym­bol­ic names (the git term for this is a tag). This makes it triv­ial to see what changed between two dif­fer­ent ver­sions.

git diff –col­or-words scrib-bk1-a01..master bk1/A-01mine.tex

In fair­ness git has a learn­ing curve and last time I checked most gui tools ranged from awful to bad (cowork­ers assure me this has changed). A sim­pler option is mer­cu­r­ial and Joel Spol­sky put togeth­er a sim­ple tuto­r­i­al for any­body unfa­mil­iar with ver­sion con­trol.

It’s not for every­body and I haven’t test­ed git with pop­u­lar tools Scriven­er but if it fits in your work­flow give it a go.

Special Characters

Sec­tion 11.12 of the The Chica­go Man­u­al of Style’s six­teenth edi­tion rec­om­mends includ­ing a list of spe­cial char­ac­ters at the end of any man­u­script (a spe­cial char­ac­ter gen­er­al­ly being any­thing not found on a stan­dard key­board). Because I’m lazy I want some­thing to do the work for me so I don’t have to track what char­ac­ters I’m using through revi­sions. Let’s make LaTeX track the spe­cial char­ac­ters we use.

Since the glos­saries pack­age sup­ports mul­ti­ple glos­saries we can use a spe­cial one just to track our spe­cial char­ac­ters. Update the pre­mable with some­thing like this:

\newglossary[spg]{special}{sps}{spo}{Special Characters}

Now we can add any spe­cial char­ac­ters specif­i­cal­ly to this glos­sary with a few extra fields filled out:

\newglossaryentry{e-acute}{
  name = \'{e},
  description = {e with acute [U+00E9]},
  type = special,
  sort = eacute
}

Notice the descrip­tion and type fields? These pro­vide infor­ma­tion about the sym­bol (in this case, é) includ­ing it’s Uni­code rep­re­sen­ta­tion and they tell LaTeX to put it in the new glos­sary we cre­at­ed in the last step. These fields are crit­i­cal to get the behav­ior we want so no skip­ping steps.

Now we can start putting our \gls{e-acute} tags wher­ev­er we want but that’s pret­ty hacky. Instead let’s add anoth­er glos­sary entry that does the work for us:

\newglossaryentry{cafe}{
  name = {caf\gls{e-acute}},
  description = { }
}

Now we can write like nor­mal and wher­ev­er we put \gls{cafe} we get a nice­ly for­mat­ted “café.” The last step is to actu­al­ly print the spe­cial char­ac­ters we’re using at the end of our doc­u­ment:

\printglossary[type=special]
Spe­cial char­ac­ter list­ing

Not only can we avoid man­u­al­ly keep­ing track of what char­ac­ters we use while edit­ing, we even include page num­bers where our spe­cial char­ac­ters appear.

Naming Characters (and Places, Groups, Gods…)

I’m awful with names. Actu­al­ly that under­sells how bad I am. I’m the kind of per­son who likes things to be pre­cise and cor­rect from the begin­ning (engi­neer­ing hat) so I don’t even like hav­ing place­hold­ers and call­ing my char­ac­ters Bob, Janet, and Tony. I’ve tried, real­ly, but I keep fid­get­ing and will spend hours try­ing to come up with the per­fect name. Plus even if I some­how move on find/replace can only do so much. If I screw up and talk about how Bbo and Tony are try­ing to one-up each oth­er to take Janet on a date we all know what’s going to hap­pen.

The solu­tion: place­hold­ers. Yeah, even though I hate them they’re still the best option. Let’s look at a prac­ti­cal exam­ple.

\newglossaryentry{first-guy}{
  name = {Bob},
  description = { }
}
\newglossaryentry{second-guy}{
  name = {Tony},
  description = { }
}
\newglossaryentry{girl}{
  name = {Janet},
  description = { }
}
\gls{first-guy} and \gls{second-guy} both have a crush
on \gls{girl}.  I'd tell you who gets her in the end but
 I haven't actually thought that far ahead.

That’s from a stu­pid LaTeX file I just wrote that spits out the fol­low­ing: “Bob and Tony both have a crush on Janet. I’d tell you who gets her in the end but I haven’t actu­al­ly thought that far ahead.” Notice how their names only appear in one place each, where I define the glos­sary entries? This means I can come in lat­er, change Bob to Brad, and after I process the file again I end up with: “Brad and Tony both have a crush on Janet. I’d tell you who gets her in the end but I haven’t actu­al­ly thought that far ahead.”

Now I can pick names that are good enough (most­ly ones I bla­tant­ly pil­fer from video games) and I don’t have to wor­ry about find/replace let­ting me down when I go through lat­er with bet­ter names. It’s already helped me once when I real­ized there were two busi­ness­es with “Irv­ing” in their name (cour­tesy of Lloyd Irv­ing from Tales of Sym­pho­nia) so all I had to do was update one entry in my glos­sary files and my name dupli­ca­tion issue went away.