OT: Cannot extract text from PDF

Tue Mar 10 10:59:16 PDT 2009

Well, at least you've got it down to a font problem.

If you don't have access to a Mac that may have the missing fonts, you
may want to try a third-party tool, such as:
http://www.pdftodocconverterpro.com/ which at least gives you a free trial.

But if you can't find a Mac and the converters don't work, you
probably need to start typing.

Art

Art Campbell
               art.campbell at gmail.com
  "... In my opinion, there's nothing in this world beats a '52
Vincent and a redheaded girl." -- Richard Thompson
                                                      No disclaimers apply.
                                                               DoD 358

On Tue, Mar 10, 2009 at 1:51 PM, Shuttleworth, Roger
<Roger_Shuttleworth at tvworks.com> wrote:
> Wow, that was worth a try! However...
>
> I reprinted the PDF to the Adobe PDF printer. No problems. The file displays OK.
>
> I tried Save As RTF from the redistilled version and got an informative message:
>
> "Acrobat was able to make this document accessible but found the following oddities:
>
> Some font(s) missing information needed to determine the characters that correspond to the symbols (glyphs) in the font. [90 of 90 glyphs (Apple
> Chancery)]"
>
> [I wonder what "accessible" means in this context? I'm none too familiar with Accessibility settings, but when I tried a Full Check it said, "All of
> the text in this document lacks a language specification." But perhaps I'm barking up the wrong tree here.]
>
> Apple Chancery is indeed an embedded subset in the original PDF.
> The resultant RTF is rather interesting but of no use to me. It consists of all caps, and a sample appears below:
> ___'YYUIOGZK_SKSHKXY_YNGRR_HK_SKSHKXY_UL_ZNK_V[HROI_UX_;=5_LGI[RZ___]NU_NG\K_ GT_OTZKXKYZ_OT_GZZKTJOTM_')+_SKKZOTMY_GTJ_VXUMXGSY_
>
> Saving as text produces similar all-cap text.
>
> It's beginning to look as though I'll have to retype the doc...the original source doc is lost (not by me, I might add!).
>
> Roger
>
>
>
> -----Original Message-----
> From: knowhowpro at gmail.com [mailto:knowhowpro at gmail.com] On Behalf Of Peter Gold
> Sent: March 10, 2009 1:08 PM
> To: Shuttleworth, Roger
> Cc: Art Campbell; framers at lists.frameusers.com
> Subject: Re: OT: Cannot extract text from PDF
>
> Have you tried:
>
> * Copy/Paste
> * Printing to PDF from Acrobat Pro, then trying to extract text by Save As?
>
> HTH
>
> Regards,
>
> Peter Gold
> KnowHow ProServices
>
> On Tue, Mar 10, 2009 at 11:54 AM, Shuttleworth, Roger
> <Roger_Shuttleworth at tvworks.com> wrote:
>> Thanks for your help.
>>
>> I can save other PDFs without a problem.
>> My Acrobat version is Acrobat Pro 7.1.0.
>> The Application was AppleWorks. The PDF Producer is Mac OSX 10.3.9 Quartz PdfContext according to the Document Properties window. There seems to be
>> nothing else interesting in the metadata, and no security applied.
>>
>> Roger
>>
>> -----Original Message-----
>> From: knowhowpro at gmail.com [mailto:knowhowpro at gmail.com] On Behalf Of Peter Gold
>> Sent: March 10, 2009 12:47 PM
>> To: Art Campbell
>> Cc: Shuttleworth, Roger; framers at lists.frameusers.com
>> Subject: Re: OT: Cannot extract text from PDF
>>
>>>> I have  a PDF that was created using Mac OSX 10.3.9. It displays fine on my Windows XP SP3 machine, but I cannot extract the text and create a
> Word
>>>> doc. When I try Save As, I get nothing produced except an error:
>>>>
>>>>
>>>>
>>>> Bad PDF; could not read page structure. <Bad PDF; error in processing fonts: cannot find CMAP resource file> [33]
>>
>> If the PDF was made using Mac's Preview application, this could be the problem;
>> check document info for Creator.
>>
>> If you get the same error when trying to Save As with all documents,
>> the Acrobat installation may be corrupted.
>>
>