A table containing almost four thousand records includes a mediumblob field for each record that contains the record’s associated PDF report. Under both MySQL Workbench and phpMyAdmin the relevant DOCUMENT column displays the data as a BLOB button or link. In the case of phpMyAdmin the link also indicates the size of the data the Blob contains.
The issue is that when the Blob button/link is clicked, under MySQL Workbench opening any of the files using the SQL Editor only displays the raw Blob data and under phpMyAdmin th link only allows the Blob data to be saved as a .bin file instead of displaying or saving the data as a viewable PDF file. All previous attempts to retrieve the original PDFs using PHP have failed – see related earlier thread: Extract Pdf from MySql Dump Saved as Text.
The filename field in the table shows that all the stored files are PDF files. Further research and tests indicate that the mediumblob data has been stored as application/octet-streams.
My question is how can the original PDFs be retrieved as readable PDFs? Is it possible for a .bin file saved from the database to be converted or used to recover the original PDF file?
Any assistance would be greatly appreciated.
2
Answers
In line with my assumption and Isaac's suggestion the only solution was to be able to speak to one of the software developers. It transpires that the documents have been zipped using an third-party library as well as the header being removed before then being stored in the database. The third-party library used is version 2.0.50727 of Chilkat, available from www.chilkatsoft.com. That version no longer appears to be available, but hopefully at least one of the later versions may do the job. Thanks again for everyone's input and assistance.
Based on the discussion in the comments, it sounds like you’ll need to either refer to the original source code or consult with the original developer to determine exactly how the data was stored.
Using phpMyAdmin to download the mediumblob data as a file will download a .bin file in many cases, I actually don’t recall how it determines content type (for instance, a PNG file will download with a .png extension, but most other binary files simply download as a .bin when phpMyAdmin isn’t sure what the extension should be, PDF included). So the behavior you’re seeing from phpMyAdmin is expected and correct, but since the .bin file doesn’t work when it’s renamed to .pdf that means something has probably gone wrong with the import and upload.
BLOB data is often stored in a pretty standardized way, but it seems your data doesn’t follow that method.
Without us seeing the code directly, we can’t guess what exactly happened with storing the data and would only be guessing.