Corrupted Word file fixes - was: Converted Word file grows enormously

Fred Ridder docudoc at hotmail.com
Fri Jul 27 05:39:18 PDT 2007


I have no major issues with anything Diane has said in her very informative
posting. But my suggestion is to take a fresh look at your basic strategy
for converting from Word to FrameMaker. Directly convering a monolithic
Word document to FrameMaker has never made much sense to me
because it ignores the differences in the document paradigms of the
two tools and takes little or no advantage of FrameMaker's strengths.

When I converted nearly 15K pages of Word docs to FrameMaker a few
years back during our group's migration, my objective was to leave as
much of the "Wordliness" behind and do everything I could to make the
converted docs into well-designed FrameMaker documents. I did a lot
of prep work on the Word docs (taking advantage of Word's excellent
macro capabilities). Among the most fundamental things I did were
these:

I applied a conversion template that removed all the headers and footers
(since these were alrady designed into my FrameMaker templates in a
standardized way that uses FrameMaker running h/f system variables
and a standardized set of user variables like book title, product name,
document type, company name), removed the TOC and index (since
FrameMaker handles these differently), removed all autonumbering
and bullets (since FrameMaker does it in a different way that works
reliably), and renamed all the Word styles to match the new style
names in our new FrameMaker templates. Admittedly, this last item
was only practical because all of our Word docs used one of three
well-controlled Word templates and basically stuck to the templates.
If you're dealing with lots of locally formatted Normal paragraphs or
a lot of ad hoc styles, you've got a problem.

I removed all graphics from the Word document, because we wanted
all graphics in our FrameMaker documents to be referenced rather than
pasted. We had always avoided linked objects in our Word docs so I
didn't have to deal with them explicitly, but I would have removed them
just as I removed the graphics.

I split the monolithic document into a series of separate, smaller files
that corresponded to the level of modularity we were going to use in
FrameMaker (generally chapter level, but down to function level for some
of our API reference documents since not all functions were supported
for all operating systems so we needed the capabilty to include/exclude
individual functions at the book level). Since I typically did this by 
copying
and pasting into a clean template, this step had the additional benefit of
getting rid of the end-of-section and end-of-document characters that
can hide so much bad juju in Word documents.

Rather than opening the Word .doc or .rtf files directly in FrameMaker,
I found that I got much cleaner results by opening an empty copy of
our FrameMaker template and using File>Import>File (by copying) to
pull in the Word content with as little Word formatting garbage as
possible.

Post-conversion cleanup involved a few further steps, including:
-re-building cross-references using the FrameMaker mechanism
-re-inserting the graphics
-cleaning up tables (mostly done with Rick Quatro's TableCleaner
   plug-in)
-deleting spurious/irrelevant imported markers (e.g. cross-reference
   markers beginning with "_TOC...") using IXgen. These don't cause
   any noticeable problem if left in, but we wanted *clean* files.
-building the new FrameMaker book using pre-built template files for
   the generated files

Following this conversion process, we wound up with FrameMaker
books that worked first time, every time.

My opinions only; I don't speak for Intel.
Fred Ridder
Intel
Parsippany, NJ (formerly)



>From: Diane Gaskill <dgcaller at earthlink.net>
>Reply-To: Diane Gaskill <dgcaller at earthlink.net>
>To: O'Laoghaire Micheal <Micheal.OLaoghaire at comverse.com>, Art Campbell 
><art.campbell at gmail.com>
>CC: framers at lists.frameusers.com
>Subject: Corrupted Word file fixes -  was: Converted Word file grows 
>enormously
>Date: Fri, 27 Jul 2007 02:28:34 -0400 (EDT)
>
>Michael,
>
>I have seen this MANY times.  We are converting to FM at my company now 
>(finally - thank God) but we have many large (400 to 800 page) Word docs 
>that contain lots of embedded drawings, screenshots, and even photos.  
>Documents like this are easily corrupted because Word has some really bad 
>memory bugs, not to mention the notorius autonumbering bug - I mean 
>auto-selfrenunbering bug.
>
>Most of the corruption in a Word doc is contained in the last paragraph 
>mark (that's where all the metadata (file descriptors, etc) is contained.)) 
>  But corruptions can also be contained in section breaks.
>
>There are a couple of ways to fix the problem.  First, the easy way, 
>although this might not fix it.
>
>1. Launch Word but do not open any files.
>2. Using Explorer, locate the file you are having trouble with and note the 
>file size.  Write it down.
>3.  SINGLE click the file to highlight it.  Do NOT double click the file 
>and open it.
>4.  With the file highlighted, in Word, select File -> Open.  The Open File 
>dialog box is displayed.
>5.  In the lower right corner of the dialog box there is a button that says 
>Open.  To the right of the button is a pull down menu.
>Expand the menu and select Open and Repair.  Word will open the highlighted 
>file, analyze it, and fix a lot of the corruption.
>6.  Save the file and then note the file size.  If there is a difference 
>from the original file size, you might have a clean file. If not, go to the 
>next procedure.
>
>Personal note:  You gotta know the Gates & Co KNOWS that Word is a pile of 
>you-know-what.  How many other applications do you know that have an Open 
>and Repair button. Sheesh.
>
>The hard Way
>Well, it's not really hard, just time consuming.
>
>1.  Launch Word but don't open any files.
>2.  Select Tools > Options > File Locations.  Note the path to User 
>Templates.
>3. Exit Word.  Shut it down compeltely.
>4. Go to whereever the path you saw in step 2 takes you and delete 
>Normal.dot.  That's right, delete it.  Or, if you have modified it (that's 
>a big no-no) just move it to another directory.
>5.  Launch Word again. When Word does not find Normal.dot, it will build a 
>nice, clean, new one with no corruptions at all.
>[If you are fast, you probably know where I am going with this.]
>6.  Now, create a brand new doc in Word.  It will automatically use the 
>nice, clean, new Normal template.  Leave this file open, but do not save 
>it.
>7.  Now open your corrupted doc.  See the bugs crawling around on the 
>screen.  (ok, ok, I just threw that in for fun).
>8.  Turn on hidden text (the Paragraph mark in the menu) so that you can 
>see the paragraph marks.
>9.  Copy everything in your file EXCEPT the last paragraph mark.
>10.  Paste that into the clean new Word doc you already have open.
>11.  Save under a new name.  Don't overwrite the corrupted file.
>12.  Note the file size.
>
>If the above procedure doesnt't reduce the file size a lot, do this:
>
>1.  Open another new, clean doc.
>2.  Open youir original, corrupted file again.  In your corrupted file, 
>delete ALL of the section breaks.  The headers and footers will not work 
>any more because the metada for them is in the section breaks.  You will 
>have to create them all again later.  This could take a while, depending on 
>the size of your doc.
>3.  Copy everything in your file EXCEPT the last paragraph mark.
>4.  Paste that into the clean new Word doc you already have open.
>5.  Save under a new name.  Don't overwrite the corrupted file.
>6.  Note the file size.
>7.  Attach your original template, ficx the section breaks, and you should 
>have a clean, uncorrupted file.
>
>For more information go to the Word MVP website http://word.mvps.org/.
>Also check out this page on the site:
>http://word.mvps.org/FAQs/AppErrors/CorruptDoc.htm
>The tiele of the page is:
>How can I recover a corrupt document or template – and why did it become 
>corrupt?
>
>Hope this helps.
>
>Diane Gaskill

_________________________________________________________________
Need a brain boost? Recharge with a stimulating game. Play now!  
http://club.live.com/home.aspx?icid=club_hotmailtextlink1




More information about the framers mailing list