[Framers] OT: XML from PDF

Roger Shuttleworth shuttie27 at gmail.com
Wed Jul 1 02:32:33 PDT 2020


Hello All.

I know that recently Acrobat Pro added the ability to export XML. Does 
anyone on the list have any experience of generating XML this way, from 
PDF? Does it use a predefined schema, or can you customise the XML by 
specifying your own schema? And is the output XML useable in the sense 
of being ingested by other programs (e.g. FrameMaker) and reused elsewhere?

A quick look at some exported XML (generated by someone else) suggests 
that the structure is very flat and that paragraphs and headings  just 
get exported as <P/> elements with no attributes except xml:lang. 
Furthermore, character formats are ignored altogether.

It looks to me as though it is pretty useless as far as adaptability is 
concerned.

Background: I have a number of PDFs created in Word and would like to be 
able to extract XML using my own schema, without going a very long route.

Thanks for any guidance received!

Roger Shuttleworth, Lincoln UK



More information about the Framers mailing list