[Web4lib] white space in web pages

Keith D. Engwall kengwall at catawba.edu
Mon Jul 24 15:26:03 EDT 2006


 For those who don't know (everyone probably does, but just in case), ASCII vs. Binary in transfers only matters when you are transferring between Windows and UNIX.  The reason it matters is that the two platforms use different ASCII formats.  Windows ASCII uses two characters (Carriage Return + Line Feed) to end a line of text, and UNIX ASCII uses only the Line Feed character.

Transferring in ASCII mode removes the CR from any CR+LF when transferring from Windows to UNIX, and adds a CR to any LF when transferring the other way.

Transferring text files from Windows to UNIX in binary causes an extra CR (usually displayed as ^M) to occur at the end of every line.  Transferring text files in binary the other way causes the CR to be missing, so that in text applications (like Notepad), all the lines run together.

Transferring binary files in ASCII is much worse, because in a binary file, there is a great possibility for random occurrances of data that just happens to "look like" LF's or CR+LF's.  Therefore, when transferring binary files in ASCII mode from Windows to UNIX, chunks of data looking like CR's are removed, and when transferring the other way, CR's are added to any chunks of data looking like LF's.  This corrupts the binary file and renders it unusable (whether an image, word document, PDF, etc.).

It is never ok to transfer binary files in ASCII.  Whether or not it is ok to transfer ASCII files in binary depends on what you can tolerate.  On one hand, whether or not the end-of-line character is properly configured tends to be moot when it comes to browsers.  The problem is relatively benign.  

On the other hand, you're still dealing with a corrupt file.  Just because the corruption is generally benign doesn't change the fact that it is still corrupt, and there's always the possibility that somewhere along the line, something trying to read the data in that file is not going to handle the corrupt end-of-line character properly, at which point unexpected behavior will ensue.  Also, if you use checksums to validate either individual files or packaged sets of files (tar, tar.gz, zip, etc.), the content is going to be different, and so the checksum will be different, and there's no way for you to know whether the different checksum is from this or from some other, less benign kind of corruption.  

Personally, I prefer to play it safe and not tolerate any kind of corruption.  Most modern transfer software can be configured to automatically select the appropriate transfer mode.  Whether or not that setting is turned on by default varies from software to software.

As for Dreamweaver, the Adobe Knowledgebase has an entry about transfer mode in Dreamweaver:

http://www.adobe.com/cfusion/knowledgebase/index.cfm?id=tn_13546

I hope you are able to get to the bottom of your problem.

Keith Engwall
Head of Library Systems and Technology
Catawba College Library
kengwall at catawba.edu
http://www.lib.catawba.edu

-----Original Message-----
From: web4lib-bounces at webjunction.org [mailto:web4lib-bounces at webjunction.org] 
Sent: Monday, July 24, 2006 12:52 PM
To: web4lib at webjunction.org
Subject: [Web4lib] white space in web pages

This post doesn't seem to have shown up during the technical difficulties on the list last week.

Has it become an acceptable practice to ftp html docs in binary instead of ASCII?

Several of us perform upkeep on our Library website, and we have different ways of doing things, owing to different platforms, etc.

When html files have been uploaded in binary instead of ASCIII, end-of-line and white space breaks are inserted into the code.
Sometimes it can even break a script.
In newer versions of ftp programs, it seems that the default is binary, presumably because pictures will not be readable if uploaded in ASCII, whereas binary rarely kills text.
My coworkers are set up to upload via DreamWeaver, which doesn't work from my computer. Does DW upload only in binary?
It can be very cumbersome to deal with the code with big gaps in it. Only a few lines can be seen at a time on a screen.

I'm set to digest for this list, so I will look for replies tonight.
Thank you,

Nancy E. Sosna
Associate Reference Librarian
Lake Forest College
Lake Forest, IL
sosna at lakeforest.edu

_______________________________________________
Web4lib mailing list
Web4lib at webjunction.org
http://lists.webjunction.org/web4lib/



More information about the Web4lib mailing list