OT: Cannot extract text from PDF

Pinkham, Jim Jim.Pinkham at voith.com
Tue Mar 10 11:18:14 PDT 2009


Any possibility of printing, scanning, and OCRing (with Adobe Paper Capture or another tool) the document? Not an elegant solution, perhaps, but quite possibly faster than retyping, if this approach is feasible for you.

-----Original Message-----
From: framers-bounces at lists.frameusers.com [mailto:framers-bounces at lists.frameusers.com] On Behalf Of Art Campbell
Sent: Tuesday, March 10, 2009 12:59 PM
To: Shuttleworth, Roger
Cc: framers at lists.frameusers.com
Subject: Re: OT: Cannot extract text from PDF

Well, at least you've got it down to a font problem.

If you don't have access to a Mac that may have the missing fonts, you may want to try a third-party tool, such as:
http://www.pdftodocconverterpro.com/ which at least gives you a free trial.

But if you can't find a Mac and the converters don't work, you probably need to start typing.

Art

Art Campbell
               art.campbell at gmail.com
  "... In my opinion, there's nothing in this world beats a '52 Vincent and a redheaded girl." -- Richard Thompson
                                                      No disclaimers apply.
                                                               DoD 358



On Tue, Mar 10, 2009 at 1:51 PM, Shuttleworth, Roger <Roger_Shuttleworth at tvworks.com> wrote:
> Wow, that was worth a try! However...
>
> I reprinted the PDF to the Adobe PDF printer. No problems. The file displays OK.
>
> I tried Save As RTF from the redistilled version and got an informative message:
>
> "Acrobat was able to make this document accessible but found the following oddities:
>
> Some font(s) missing information needed to determine the characters 
> that correspond to the symbols (glyphs) in the font. [90 of 90 glyphs (Apple Chancery)]"
>
> [I wonder what "accessible" means in this context? I'm none too 
> familiar with Accessibility settings, but when I tried a Full Check it 
> said, "All of the text in this document lacks a language 
> specification." But perhaps I'm barking up the wrong tree here.]
>
> Apple Chancery is indeed an embedded subset in the original PDF.
> The resultant RTF is rather interesting but of no use to me. It consists of all caps, and a sample appears below:
> ___'YYUIOGZK_SKSHKXY_YNGRR_HK_SKSHKXY_UL_ZNK_V[HROI_UX_;=5_LGI[RZ___]N
> U_NG\K_ GT_OTZKXKYZ_OT_GZZKTJOTM_')+_SKKZOTMY_GTJ_VXUMXGSY_
>
> Saving as text produces similar all-cap text.
>
> It's beginning to look as though I'll have to retype the doc...the original source doc is lost (not by me, I might add!).
>
> Roger
>
>
>
> -----Original Message-----
> From: knowhowpro at gmail.com [mailto:knowhowpro at gmail.com] On Behalf Of 
> Peter Gold
> Sent: March 10, 2009 1:08 PM
> To: Shuttleworth, Roger
> Cc: Art Campbell; framers at lists.frameusers.com
> Subject: Re: OT: Cannot extract text from PDF
>
> Have you tried:
>
> * Copy/Paste
> * Printing to PDF from Acrobat Pro, then trying to extract text by Save As?
>
> HTH
>
> Regards,
>
> Peter Gold
> KnowHow ProServices
>
> On Tue, Mar 10, 2009 at 11:54 AM, Shuttleworth, Roger 
> <Roger_Shuttleworth at tvworks.com> wrote:
>> Thanks for your help.
>>
>> I can save other PDFs without a problem.
>> My Acrobat version is Acrobat Pro 7.1.0.
>> The Application was AppleWorks. The PDF Producer is Mac OSX 10.3.9 
>> Quartz PdfContext according to the Document Properties window. There seems to be nothing else interesting in the metadata, and no security applied.
>>
>> Roger
>>
>> -----Original Message-----
>> From: knowhowpro at gmail.com [mailto:knowhowpro at gmail.com] On Behalf Of 
>> Peter Gold
>> Sent: March 10, 2009 12:47 PM
>> To: Art Campbell
>> Cc: Shuttleworth, Roger; framers at lists.frameusers.com
>> Subject: Re: OT: Cannot extract text from PDF
>>
>>>> I have  a PDF that was created using Mac OSX 10.3.9. It displays 
>>>> fine on my Windows XP SP3 machine, but I cannot extract the text 
>>>> and create a
> Word
>>>> doc. When I try Save As, I get nothing produced except an error:
>>>>
>>>>
>>>>
>>>> Bad PDF; could not read page structure. <Bad PDF; error in 
>>>> processing fonts: cannot find CMAP resource file> [33]
>>
>> If the PDF was made using Mac's Preview application, this could be 
>> the problem; check document info for Creator.
>>
>> If you get the same error when trying to Save As with all documents, 
>> the Acrobat installation may be corrupted.
>>
>
_______________________________________________


You are currently subscribed to Framers as jim.pinkham at voith.com.

Send list messages to framers at lists.frameusers.com.

To unsubscribe send a blank email to
framers-unsubscribe at lists.frameusers.com
or visit http://lists.frameusers.com/mailman/options/framers/jim.pinkham%40voith.com

Send administrative questions to listadmin at frameusers.com. Visit http://www.frameusers.com/ for more resources and info.



More information about the framers mailing list