OT: Cannot extract text from PDF

Shuttleworth, Roger Roger_Shuttleworth at tvworks.com
Tue Mar 10 10:51:12 PDT 2009


Wow, that was worth a try! However...

I reprinted the PDF to the Adobe PDF printer. No problems. The file displays OK.

I tried Save As RTF from the redistilled version and got an informative message:

"Acrobat was able to make this document accessible but found the following oddities:

Some font(s) missing information needed to determine the characters that correspond to the symbols (glyphs) in the font. [90 of 90 glyphs (Apple
Chancery)]"

[I wonder what "accessible" means in this context? I'm none too familiar with Accessibility settings, but when I tried a Full Check it said, "All of
the text in this document lacks a language specification." But perhaps I'm barking up the wrong tree here.]

Apple Chancery is indeed an embedded subset in the original PDF.
The resultant RTF is rather interesting but of no use to me. It consists of all caps, and a sample appears below:
___'YYUIOGZK_SKSHKXY_YNGRR_HK_SKSHKXY_UL_ZNK_V[HROI_UX_;=5_LGI[RZ___]NU_NG\K_ GT_OTZKXKYZ_OT_GZZKTJOTM_')+_SKKZOTMY_GTJ_VXUMXGSY_

Saving as text produces similar all-cap text.

It's beginning to look as though I'll have to retype the doc...the original source doc is lost (not by me, I might add!).

Roger



-----Original Message-----
From: knowhowpro at gmail.com [mailto:knowhowpro at gmail.com] On Behalf Of Peter Gold
Sent: March 10, 2009 1:08 PM
To: Shuttleworth, Roger
Cc: Art Campbell; framers at lists.frameusers.com
Subject: Re: OT: Cannot extract text from PDF

Have you tried:

* Copy/Paste
* Printing to PDF from Acrobat Pro, then trying to extract text by Save As?

HTH

Regards,

Peter Gold
KnowHow ProServices

On Tue, Mar 10, 2009 at 11:54 AM, Shuttleworth, Roger
<Roger_Shuttleworth at tvworks.com> wrote:
> Thanks for your help.
>
> I can save other PDFs without a problem.
> My Acrobat version is Acrobat Pro 7.1.0.
> The Application was AppleWorks. The PDF Producer is Mac OSX 10.3.9 Quartz PdfContext according to the Document Properties window. There seems to be
> nothing else interesting in the metadata, and no security applied.
>
> Roger
>
> -----Original Message-----
> From: knowhowpro at gmail.com [mailto:knowhowpro at gmail.com] On Behalf Of Peter Gold
> Sent: March 10, 2009 12:47 PM
> To: Art Campbell
> Cc: Shuttleworth, Roger; framers at lists.frameusers.com
> Subject: Re: OT: Cannot extract text from PDF
>
>>> I have  a PDF that was created using Mac OSX 10.3.9. It displays fine on my Windows XP SP3 machine, but I cannot extract the text and create a
Word
>>> doc. When I try Save As, I get nothing produced except an error:
>>>
>>>
>>>
>>> Bad PDF; could not read page structure. <Bad PDF; error in processing fonts: cannot find CMAP resource file> [33]
>
> If the PDF was made using Mac's Preview application, this could be the problem;
> check document info for Creator.
>
> If you get the same error when trying to Save As with all documents,
> the Acrobat installation may be corrupted.
>


More information about the framers mailing list