An age-old challenge that developers have been facing since the dawn of programming is transferring large documents or files between applications, servers or file systems and over time there have been a variety of guidelines and tools developed to address this.
The rise of APIs provides another avenue for sending and receiving large documents. This blog will discuss when it’s appropriate to use large documents with APIs and provide some recommendations to ensure success.
Protocols and Middleware Technologies
There have many protocols built to address file transfer including FTP, HTTP, WebDAV, SCP and AS2 and almost all of the popular programming languages (C/C++, Java, PHP, .Net, Perl, Node.js) provide various libraries that can be used to leverage these protocols to transfer documents.
Most middleware products such as Enterprise Service Bus technologies (Oracle, Tibco, IBM) and B2B Integrator (IBM Sterling..etc) also support manage file transfer and there are various MFT file gateway products in the market.
Transferring Files with APIs
RESTful HTTP based APIs are the current ‘go-to’ approach for designing applications and file upload and download is a common business requirement for many applications. Files can be streamed attachments or links to the actual content. When passing large documents “via” APIs the ideal scenario is to utilize a link to the large file and pass that link in JSON payloads, only touching the file when an application, archival system or some persistent store needs the file contents stored locally. This approach takes API processing times from seconds to tens of milliseconds, effectively reducing CPU usage which leads to reduced cost or more available capacity.
APIs vs Other File Transfer Technologies
APIs don’t replace file transfer technologies and methodologies; the key is knowing when to use which approach. There are a number of factors to consider including architectural style, file type, the size of the files, end destination and what sort of clients will be using it to upload/download files.
APIs are an ideal approach for ‘content only’ files like pdfs and gifs however, it is important to consider the architectural style you’re using when streaming large documents through APIs. Rather than batch style scheduled ‘loading’ or ‘processing’ via APIs we recommend using an event-based approach where individual events are processed via APIs as they occur in near real-time (e.g. queue the events as they happen and process them with microservices as capacity permits). Effectively we are recommending debatching before you hit the APIs.
[Want to know how APIs and Event-Driven Architecture can work together? Check out another blog here.]
For large csv or xml files or where you really need batch processing then file transfer should ideally be left to the batch processing specialists like MFT, ETL, de-batchers, using SFTP or other file streaming transports i.e. no processing of the contents of the file simply streaming, passing, uploading.