o Tim Finin o UMBC Baltimore MD o finin@umbc.edu o http://umbc.edu/~finin/ o


Plain Text is Better Text

Tim Finin
January 2004

In email, plain text is better text. William Strunk advised writers to "Omit needless words" in Elements of Style. Well, I say, "Omit needless bytes".

I frequently get email from colleagues and students that essentially consist of an attached Microsoft word document, perhaps with an informative "please see this attached memo". After starting up the application and viewing the file it typically consists of a couple of paragraphs of plain text. How much better it would have been if the sender had just included the plain text in the message.

Sending a Microsoft Word document when a simple plain text message will do is a bad idea, and bad for many reasons.

First, it's a waste of resources. A recent one page memo of a few simple paragraphs came as message of 34,000 bytes. The same memo in plain text form needed less than 2000 bytes. Not a big deal, you say, but what if the memo is sent to the 500 faculty on my campus (as many are), or worse to 12,000 UMBC students? why should our messages be ten times bigger than they need be? And I've seen many examples where the difference between an attached word document and the equivalent plain text form was a factor of 100! The resources wasted include networking bandwidth and file storage. Each message will get stored several times -- on the mail server, any backup systems it uses and the workstation you use to read it. Even when the message is deleted, it lives on, as copies of it get written to backup tapes where they are careful stored and tended to. Maybe for years. Those 12,000 copies of a 34,000 byte message may outlive the sender! Plain text sent in a message is just about as compact as possible.

Second, it's dangerous. Attached files to be inherently risky for several reasons. You never know who the real sender of a message is -- just because the mail seems to be sent from a trusted colleague, it may be spoofed or it may have been sent out automatically from her computer by a virus. Although you *thought* the attachment was a safe to open, maybe you didn't look closely enough and it was actually an executable file. Complex attachments like Word documents or Excel spreadsheets can also carry macro viruses of their own. The sender may be unknowingly infected, and can propagate the virus by sending an infected file to you. Plain text sent in a message is safe.

Third, attached files in formats like Word, Powerpoint, PDF, Postscript and Excel often contain lots of meta-data (data about data) that is private, such as the names of people who wrote or edited the document, information on the computers and networks and printers involved in the document, text deleted from the document earlier, etc. This data is normally invisible to the user but can be easily exposed and extracted by simple programs. Tony Blair's administration found this out the hard way in the "dodgy dossier, affair after the IRAQ war. Simon Byers of AT&T Research recently studied (http://www.user-agent.org/word_docs.pdf) the scope of this problem by randomly sampling 100,000 Word documents available on the Internet and found hidden text in all of the documents, some quite sensitive.

Fourth, it limits accessibility. The receiver must have the appropriate application to handle the attachment. Microsoft word, for example, is an expensive proprietary software system. Although widespread, it's far from universal. Many UMBC faculty and students use Unix and can't read Word attachments. You can't read it on your PDA or a cell phone either. Plain text can be handled by any system that can process email.

Fifth, reading attached files makes more work for the receiver. Launching the application to access the attachment typically takes some seconds, maybe as many as ten or twenty on an older, under powered machine. While ten seconds isn't very long, try adding ten seconds to the time it takes to read each of the dozens or hundreds of messages you get in a day. Plain text in the body of the message is immediate.

Of course there are times when it makes sense to send attached documents. Maybe the layout and fonts are critically important for the document. Maybe there are drawing or images. Maybe you want to recipient to be able to edit the source of the document or manipulate it in some way that the encoding allows. However, if all you want is for the receiver to read a few paragraphs of text, send the text.

To be completed: Why do people do this? editors with useful features like spelling correction and automatic justication; familiarity, ...

Just using pdf is not always the answer. I recently recived a message that consisted of the text of a brief letter (just under 1000 characters/bytes) that included a pdf of the letter on letterhead. I was amazed at the size of the attached pdf -- over 8,000,000 bytes!. This can happen when a hardcopy letter is scanned in at high resolution. Not only does this represent an 8,000 fold bloating of the original message, reading it severely strained my aging home computer when I looked at the pdf version.

Most of the times that someone sent me an attached Microsoft Word document it was a bad idea. And bad for many reasons.

23 January, 2004 20:12