OT: Cannot extract text from PDF
Shuttleworth, Roger
Roger_Shuttleworth at tvworks.com
Tue Mar 10 10:51:12 PDT 2009
Wow, that was worth a try! However...
I reprinted the PDF to the Adobe PDF printer. No problems. The file displays OK.
I tried Save As RTF from the redistilled version and got an informative message:
"Acrobat was able to make this document accessible but found the following oddities:
Some font(s) missing information needed to determine the characters that correspond to the symbols (glyphs) in the font. [90 of 90 glyphs (Apple
Chancery)]"
[I wonder what "accessible" means in this context? I'm none too familiar with Accessibility settings, but when I tried a Full Check it said, "All of
the text in this document lacks a language specification." But perhaps I'm barking up the wrong tree here.]
Apple Chancery is indeed an embedded subset in the original PDF.
The resultant RTF is rather interesting but of no use to me. It consists of all caps, and a sample appears below:
___'YYUIOGZK_SKSHKXY_YNGRR_HK_SKSHKXY_UL_ZNK_V[HROI_UX_;=5_LGI[RZ___]NU_NG\K_ GT_OTZKXKYZ_OT_GZZKTJOTM_')+_SKKZOTMY_GTJ_VXUMXGSY_
Saving as text produces similar all-cap text.
It's beginning to look as though I'll have to retype the doc...the original source doc is lost (not by me, I might add!).
Roger
-----Original Message-----
From: knowhowpro at gmail.com [mailto:knowhowpro at gmail.com] On Behalf Of Peter Gold
Sent: March 10, 2009 1:08 PM
To: Shuttleworth, Roger
Cc: Art Campbell; framers at lists.frameusers.com
Subject: Re: OT: Cannot extract text from PDF
Have you tried:
* Copy/Paste
* Printing to PDF from Acrobat Pro, then trying to extract text by Save As?
HTH
Regards,
Peter Gold
KnowHow ProServices
On Tue, Mar 10, 2009 at 11:54 AM, Shuttleworth, Roger
<Roger_Shuttleworth at tvworks.com> wrote:
> Thanks for your help.
>
> I can save other PDFs without a problem.
> My Acrobat version is Acrobat Pro 7.1.0.
> The Application was AppleWorks. The PDF Producer is Mac OSX 10.3.9 Quartz PdfContext according to the Document Properties window. There seems to be
> nothing else interesting in the metadata, and no security applied.
>
> Roger
>
> -----Original Message-----
> From: knowhowpro at gmail.com [mailto:knowhowpro at gmail.com] On Behalf Of Peter Gold
> Sent: March 10, 2009 12:47 PM
> To: Art Campbell
> Cc: Shuttleworth, Roger; framers at lists.frameusers.com
> Subject: Re: OT: Cannot extract text from PDF
>
>>> I have a PDF that was created using Mac OSX 10.3.9. It displays fine on my Windows XP SP3 machine, but I cannot extract the text and create a
Word
>>> doc. When I try Save As, I get nothing produced except an error:
>>>
>>>
>>>
>>> Bad PDF; could not read page structure. <Bad PDF; error in processing fonts: cannot find CMAP resource file> [33]
>
> If the PDF was made using Mac's Preview application, this could be the problem;
> check document info for Creator.
>
> If you get the same error when trying to Save As with all documents,
> the Acrobat installation may be corrupted.
>
More information about the framers
mailing list