Why do we need yet another PostScript Interpreter?

(github repository)

Print to PDF print drivers are ubiquitous in the Windows world. They are everywhere, and part of the reason for that may be because Microsoft ships the pscript5.dll on windows with which you can create a PostScript/PDF print driver in an afternoon.

However, all of those print drivers simply create PDF files, well, duh !? I guess that's obvious :)

But what's NOT obvious is that these drivers actually create a PostScript language file ('.ps), and they turn that into a PDF file and throw away the '.ps file.

SO... what if the driver didn't throw away the '.ps file ?

One format for all (and in the darkness bind them ?)

Consider for a moment that you could have one and only one file format available to you, from every single file on your system or network converted TO that format automatically ?? No "File - Save As", no special software, of differing flavors, you would have to launch against each unique file type. Just a simple print, and voila, you've got a representation of anything whatsoever, into a file format that is consistent across every type of file.

Maybe that format (*.ps) isn't important to you yet, more on that later, but for now, think about what you could do if you had some need to analyze any type of file, and you could front-end that analysis by simply printing the file, i.e., not having to manually convert to some common format.

A configurable PostScript print driver - download it here.

My CursiVision system, built for electronic signature capture and document management, contains a PostScript print driver that I am happy to provide the installer for. Download it's installer and run it.

This driver does not throw away the intermediate PostScript file. In addition, the driver has registry settings that you can use to specify what to "launch" against these retained PostScript files.

To set this up, run regedt32 at a DOS prompt, navigate to the HKEY_LOCAL_MACHINE/Software/EnVisioNate/CursiVision key and set the "Print Driver Target" value to the name of an executable that you would like to run. Include the full-path of the executable if it's not on the path. You do not need to quote the name, even if it has spaces in it.

Note that the print driver will pass the name of the generated '.ps file created by the print processor as the first argument to that executable. If your process needs more arguments, you can specify those in the "Print Driver Arguments" setting.

The print driver will also place these resulting '.ps files in the "Printed Documents Directory" location specified in the registry. You can specify any location here in which the logged in user has write access to. If you leave this blank, the files will be placed in your temp directory.

You should set these registry settings for the WOW6432Node in the registry too.

To get started, consider building the PostScript repository artifacts which will produce the "HostPostScript" executable. If you specify that in the above registry entries, you can then print ANY document and render it to your monitor. In fact, I've added an argument to HostPoscript called "parse", which means to proceed to parse the PostScript file as soon as it is opened. Put that in the "Print Driver Arguments" to enable it.

Another easy way to see how this works: specify "notepad" as the "Print Driver Target" - you'll see raw PostScript language code generated from any document you print.

By the way, you CAN share the CursiVision printer over the network, and if you do, you can launch any executable on the machine that is serving up the printer. For example, if you wanted to "chain" some process across multiple computers, OR, perhaps you'd like to dedicate one computer on your network to perform some task, this is one way to do these things.

If you have any issue you look at the windows Event Viewer in the Application of the Windows Logs to see if the driver had some issue during execution.

Combine these two things, and you can build a pretty significant software or document process with them in no time at all.

The EnVisioNateSW PostScript interpreter

The PostScript Interpreter software fits perfectly into a scenario where, for example, you want to print any file or file type, and automate the investigation or analyzation of the contents.

Over the years I have seen countless requirements that specify documents should be examined for content, and perhaps for the data generated within them. Those who may require such a thing may also believe they would have to develop custom software for each "type" of file to be examined. Per the above discussion, that is no longer required.

However, unlike a tool destined solely for the creation of PDF files, this PostScript interpreter is entirely different and is, from it's conception, intended to be useful for ANY purpose it's clientele may dream up as necessary, such as inspecting the content of documents for particular text or data.

As a robust COM object, this software is highly extensible in it's configurability for unique purposes. Further, it's implementation and architecture are so clean and obvious, you can easily see where and how such configuration(s) should be made. Isn't that one of the ideas of Open Source, to be able to bend it to your needs ?

A term I use often in discussing software characteristics, is "Extensibility".

For me, the true meaning of this is that I should be able to take any particular software system that relates, in some way, to the particular problem domain I'm trying to provide a solution for, and to "Extend" that system such that it performs perfectly to my needs in that regard.

Is that somehow too much to ask ?

I would think that the concepts of Open Source and "Extensibility" would go hand in hand as compolementary traits. Yet, alas, in my experience that is just never the case. It is sooo hard to follow the vast majority of Open Source, and it is SO poorly architected, that it is near impossible to find the "place" to extend it, let alone a suitable strategy to do so.

Why is Open source like this?

Here are a couple of reasons:

It is sloppy, sloppy, sloppy. It is disorganized, disheveled, and difficult to read
It is not architected. It does not have any sort of formal integration technology

No formal adherance to any sort of interface definitions, no idl/odl files
Plugins are bullshit, capabilities are offered because a file ('.dll) exists somewhere ? No thanks
Reliance on header files to dictate interoperability ? Maintenance and versioning headache ! No Way

It is poorly structured, too hard to build, and frankly, to distracted by the need for cross platforms
..... many more

Thus, you might begin to understand why I am in this space. I am here to do what I can to improve the state of software development, using it's most glaring example of what demands to be fixed: Open Source.

My work as an example

The fact that it is so hard to get Open Source to bend to your needs is part of the reason I became focused on writing examples of how it should be created to actually embody "Extensibility" as it's primary nature.

I'm a COM fanatic

Don't confuse that to mean I'm a diehard Windows Fanatic, i.e., it's not correct to assume COM is only Windows. If you think I'm saying it is NOT on Linux, you'd be wrong, I'm not saying that at all, the fact is, I don't even know.

And also, I'm not necessarily a fanatic of any OS or platform, I've been there and done that. I was a diehard OS/2 devotee in the early 90's, I even had the "OS2 MAN" customized license plate (still do on the wall). So I've already witnessed far, far better platforms take a dive to Windows, but what was I to do ? Write something on something that nobody would use ? Remember the OS/2 software for managing your CompuServe account ? Me either. Remember CompuServe ? Me either.

But what COM is is that formal set of concepts that provide the pathway to the Extensibility and configurability that I so strongly require in literally every software project that I do, and there have been a lot of them.

What COM is not is inextricably tied to Windows or any OS for that matter, and this is, again, something that I feel is not understood by the masses.

On the surface, developers probably think that to use COM, you need CoInitialize, CoCreate, and other Windows API calls, as well as the "registration" facilities in Windows to manage, versioning, install locations, platforms, bitness, etc., etc. All of those things are great, but absolutely not required.

COM Artifacts are just '.dlls in the end - you absolutely don't need that Windows architecture to load '.dlls, right? The ability to load dynamic libraries is native to every platform/OS/Development environment. At least any that's worth bothering with !

Even without the Windows API calls around COM, and especially the Common Language Runtime (aach !!), COM is still a fantastic, elegant, and ultimately simple set of techniques to make software interoperability, maintainability, and extensibility very easy to implement. The Windows API stuff does make it easier, but you don't really need it, and you don't have it anyway on other platforms, maybe somebody should write it there (?).

The documentation is not great, of course, being a product of inane documentation automation like Doxygen - don't get me started on that. But with a modern development environment and especially a debugger, it's easy enough to figure out.

What does this have to do with the PostScript Interpreter?

This project, the interpreter, is used to parse PostScript language files. As above, those files are the product of a PostScript print driver, a version of such I offered for download above, and are the "one and only one" file format that you can automate the conversion to from any file on your system.

The interpreter has a simple events interface, the IPostScriptEvents interface, with which it can call into your software when it finds something interesting in those PostScript source files.

There is a very simple example application in the repository that shows how to use this interface. At present, the interface (source) will tell your software, the client (sink) of the events interface, where and what text exists in that document.

But what about the future uses of the Interpreter? Can you find uses for it in your software processes?

If I have done my job correctly, you should be able to extend this software to do anything regarding data, text, graphics or whatever that you would typically find in documents, or any file you can print.

Look at the sources in the repository, see if you can't quickly locate those places within it that would be dealing with what you are interested in. Then, note the pattern of events interfaces and how they are defined and used within the PostScript Interpreter.

Note that most of the code in the system regarding events is in the folder "COM Events", I point that out as an example of what I mean about the failings of Open Source - I wish everybody would think about organization as an important concept in software maintainability.

From all of that, you should be able to see where within the interpreter you would make outgoing calls on an events interface, and also to define that interface TO the interpreter, and finally, how to hook up to that interface in your client software.

Should you need any assistance at all with a more complex or different requirement, don't hesitate to email me, again at This email address is being protected from spambots. You need JavaScript enabled to view it..