Javascript can be used to automate data conversion and downloading. For example, I recently had over 14,000 lines of code staring back at me, and I contemplated how I was going to access certain nested photo URLs within that code. Still, after finding some solution to accessing the URLs, I had to somehow download a total of 700 images from Pexels.com — a website that provides free images. In situations like this, it’s great to know how to automate a data cleaning and downloading process. Node’s ‘fs’ (file system) module became a perfect solution to this problem for cleaning the data, writing this clean data to a new file, then automating 700 photo downloads — one can even automate the creation of separate directories like I did (100 directories with 7 photos each).
To begin, I investigated the raw data to find common keys for the URLs that I needed.
I decided on the unique ‘id’ and the nested ‘src.large’ photo URL. With a simple forEach loop, I iterated over the raw data, and with each iteration, I pushed a new object containing the key/value pairs for ‘id’ (the unique identifier) and ‘scr.large’ (the URL) to an array.
After that forEach loop, I used fs.writeFile. This method accepts a named location for the new file (I used ‘cleanAPIdata.json’), the content to write to that file (in this case, the array of objects), and a callback which I used to handle any unexpected errors in addition to some terminal feedback upon success.
Armed with a freshly-created JSON file, my next logical step was to prepare my file system to accept all the photos I was about to download. Node’s fs.mkdir method allowed me to use a for-loop to create 100 directories by passing in arguments for the directory location (I used `images/${i}`, but make sure the ‘images’ directory already exists), and a callback to handle any unexpected errors. If you’re curious as to why I didn’t provide any feedback upon success, it’s because I could see in my file system if the directories were made — so, that became the feedback I needed.
Now the fun part — making HTTP requests to download each photo. I used Axios with a response type of ‘stream’ as an option. This option is part of an options object that can be passed in to the Axios request. For example, { method: ‘GET’, url: ‘an-individual-pexel-photo-url’, responseType: ‘stream’ }. The ‘stream’ option indicates the type of response I was requesting from the server.
But what is a stream? A stream is an efficient way to interact with data (usually a large amount of data, like media — think streaming movies from Netflix) where the data are sent in chunks so that a user doesn’t have to wait for the entire asset to load into memory. As such, a user or a program can begin to interact with the data sooner. Thus, streams are both memory efficient and time efficient.
Since Axios returns a Promise, once the response stream began, I used a ‘.then’ method on the Promise to start piping the readable stream into a writable stream. To clarify, with the server’s stream response, I was able to implement the following concept: ‘response.data.pipe(writable stream)’ where ‘response’ was the server’s response object, and ‘data’ was the property containing the requested data. As for pipe, according to Node’s docs, pipes “limit the buffering of data to acceptable levels such that sources and destinations of differing speeds will not overwhelm the available memory”. It is essentially a handoff mechanism. The writable stream I mentioned earlier is the destination that Node is talking about — my photo file.
To create the writable stream, I made use of Node’s fs.createWriteStream method. This method accepts a destination path which I used to specify that I was anticipating a jpeg file. So altogether, the line of code that makes use of the source and destination streams follows:
response.data.pipe(fs.createWriteStream(`/images/${folder}/${photoCounter}.jpeg`)). A template literal is used here because the 700 HTTP requests were made as part of a for-loop where I kept count of the number of photos, and once the counter reached 7, then I incremented the folder. The result was 7 photos in each of the 100 directories!
Though this is a powerful way to automate data conversion and downloading, it’s worth noting to be mindful of copyrights and search engine T&Cs. And, though writing code to access information on the internet may be protected by the first amendment, that doesn’t mean one is not violating any laws. For example, one company has recently become involved with a legal battle because they are scraping the internet for photos, even though the photos are publicly available; however, they are using those photos to create biometric facial recognition surveillance technology. So, be ethical in your decisions with this powerful knowledge.

You may also like

Back to Top