How to retrieve an image as a stream from a PDF using Hummus.js?
Clash Royale CLAN TAG#URR8PPP
How to retrieve an image as a stream from a PDF using Hummus.js?
I’m trying to extract a PNG image's stream/buffer from a PDF. I can get the ePDFObjectName object, but any attempt to actually get the underlying reference stream results in a segfault.
Effectively, I made one of the author's tests into a class, took out the logger part and bolted on a method called findComponentInPDFDictionary
.
findComponentInPDFDictionary
My class is instantiated with 2 arguments:
- componentType = the name of the PDFName value I’m looking for (in this case “Image”)
- callback = a supplied callback on which I was hoping to be able to retrieve the stream.
here's an abbreviated example:
module.exports = function findPDFPage (componentType, callback) {
let _this = this
...
function findComponentInPDFDictionary (aDictionary, inReader)
const dict = aDictionary.toJSObject()
Object.getOwnPropertyNames(dict).forEach(
function (element, index, array)
if (componentType !== 'all' && callback &&
dict[element].value &&
typeof dict[element].value === 'string' &&
dict[element].value.search(componentType) > -1)
if (aDictionary.exists(element))
// DO SOMETHING WITH THE REQUESTED COMPONENT TYPE HERE
// E.G. componentType = 'Image'
// vvvvvvvvvvvvvvvvvvvvvvvvvvv
callback(
inReader.queryDictionaryObject(aDictionary, element),
aDictionary, inReader)
else _this.iterateObjectTypes(dict[element], inReader)
)
...
this.iterateObjectTypes = function (inObject, inReader)
... logic that recursively iterates through different PDF objects
switch (inObject.getType())
case hummus.ePDFObjectDictionary:
findComponentInPDFDictionary(inObject.toPDFDictionary(), inReader)
break
case ...
...
I’ve tried converting the value from inReader.queryDictionaryObject(aDictionary, element) to a bunch of different types and it just segfaults on me every time!
the Hummus.js documentation on parsing is a little vague as to whether it supports access to image streams; but I would presume the image still exists so
Please, Is it possible to get that stream? And if so, how?
subtype: PDFName: 'Image'
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
I thhhhink I'm one level too deep on the PDF. As in: the object that I've found in the dictionary that contains the metadata value
subtype: PDFName: 'Image'
might actually be the complete dictionary metadata object for the image. I'm going to investigate if the parent of this dictionary's "IndirectObjectReference" points me to the actual memory stream of the PNG. If that fails then I'm completely out of ideas (i've written the author too, of course, so we'll see what they say).– Jay Edwards
Aug 11 at 10:18