How to retrieve an image as a stream from a PDF using Hummus.js?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



How to retrieve an image as a stream from a PDF using Hummus.js?



I’m trying to extract a PNG image's stream/buffer from a PDF. I can get the ePDFObjectName object, but any attempt to actually get the underlying reference stream results in a segfault.



Effectively, I made one of the author's tests into a class, took out the logger part and bolted on a method called findComponentInPDFDictionary.


findComponentInPDFDictionary



My class is instantiated with 2 arguments:
- componentType = the name of the PDFName value I’m looking for (in this case “Image”)
- callback = a supplied callback on which I was hoping to be able to retrieve the stream.



here's an abbreviated example:


module.exports = function findPDFPage (componentType, callback) {
let _this = this

...

function findComponentInPDFDictionary (aDictionary, inReader)
const dict = aDictionary.toJSObject()
Object.getOwnPropertyNames(dict).forEach(
function (element, index, array)
if (componentType !== 'all' && callback &&
dict[element].value &&
typeof dict[element].value === 'string' &&
dict[element].value.search(componentType) > -1)
if (aDictionary.exists(element))
// DO SOMETHING WITH THE REQUESTED COMPONENT TYPE HERE
// E.G. componentType = 'Image'
// vvvvvvvvvvvvvvvvvvvvvvvvvvv
callback(
inReader.queryDictionaryObject(aDictionary, element),
aDictionary, inReader)

else _this.iterateObjectTypes(dict[element], inReader)
)


...

this.iterateObjectTypes = function (inObject, inReader)
... logic that recursively iterates through different PDF objects
switch (inObject.getType())
case hummus.ePDFObjectDictionary:
findComponentInPDFDictionary(inObject.toPDFDictionary(), inReader)
break

case ...

...





I’ve tried converting the value from inReader.queryDictionaryObject(aDictionary, element) to a bunch of different types and it just segfaults on me every time!



the Hummus.js documentation on parsing is a little vague as to whether it supports access to image streams; but I would presume the image still exists so



Please, Is it possible to get that stream? And if so, how?





I thhhhink I'm one level too deep on the PDF. As in: the object that I've found in the dictionary that contains the metadata value subtype: PDFName: 'Image' might actually be the complete dictionary metadata object for the image. I'm going to investigate if the parent of this dictionary's "IndirectObjectReference" points me to the actual memory stream of the PNG. If that fails then I'm completely out of ideas (i've written the author too, of course, so we'll see what they say).
– Jay Edwards
Aug 11 at 10:18



subtype: PDFName: 'Image'









By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard