[Framers] OT: XML from PDF
Roger Shuttleworth
shuttie27 at gmail.com
Wed Jul 1 02:32:33 PDT 2020
Hello All.
I know that recently Acrobat Pro added the ability to export XML. Does
anyone on the list have any experience of generating XML this way, from
PDF? Does it use a predefined schema, or can you customise the XML by
specifying your own schema? And is the output XML useable in the sense
of being ingested by other programs (e.g. FrameMaker) and reused elsewhere?
A quick look at some exported XML (generated by someone else) suggests
that the structure is very flat and that paragraphs and headings just
get exported as <P/> elements with no attributes except xml:lang.
Furthermore, character formats are ignored altogether.
It looks to me as though it is pretty useless as far as adaptability is
concerned.
Background: I have a number of PDFs created in Word and would like to be
able to extract XML using my own schema, without going a very long route.
Thanks for any guidance received!
Roger Shuttleworth, Lincoln UK
More information about the Framers
mailing list