Scanning large FTP directories with apache.commons.net.ftp
Hi,
I'm using apache.commons.net.ftp.FTPClient to scan an FTP directory which can contain thousands of files which have been written over several months.
I'll only ever be interested in a recent subset of those files though (about a week's worth at most), but I'm having to loop through thousands of instances which is taking unacceptably long (over an hour). Is there any way that I can eliminate the older files on mass without having to painfully iterate through each one? Here's the code I'm currently using to do the date checks...
client.cwd(directory);
ftpFiles = client.listFiles(directory);
for (FTPFile ftpFile : ftpFiles) {
String name = ftpFile.getName();
if (ftpFile.getType() == FTPFile.FILE_TYPE) {
Date today = new Date();
long dateDiffHours = (today.getTime() - ftpFile.getTimestamp().getTimeInMillis())/(1000 * 60 * 60);
if (dateDiffHours>(argsQueryHours+2)){}else{
//do something with the file because it's recent enough
Any help would be really appreciated! Thanks,
Daniel
Re: Scanning large FTP directories with apache.commons.net.ftp
Quote:
Is there any way that I can eliminate the older files on mass without having to painfully iterate through each one
If you have a list of the filenames, how do you determine which file you want to process and which file you want to skip?
What is in the ftpFiles object? Is there something local in the object that can be used to select a file?
I assume that the problem is with the getTimeStamp() method requires a turnaround with the server.
Is there a way to get a directory list from the server that will have the names and file's date all in one file obtained in one transaction?
My simple FTP tool returns a dir list using the LISt command:
Code :
drwxr-x--- 2 a3180082 a3180082 4096 Feb 5 07:42 .
drwx--x--x 3 a3180082 a3180082 4096 Dec 6 16:38 ..
-rw-r--r-- 1 a3180082 a3180082 91 Dec 6 16:04 .htaccess
-rw-r--r-- 1 a3180082 a3180082 363 Dec 7 13:38 AppletReader.html
-rw-r--r-- 1 a3180082 a3180082 41 Feb 5 07:42 AppletTestURL.txt
-rw-r--r-- 1 a3180082 a3180082 24016 Dec 6 16:39 ImagesInBlob.jar
-rw-r--r-- 1 a3180082 a3180082 324867 Dec 6 16:39 SmallBlob.dat
-rw-r--r-- 1 a3180082 a3180082 297 Dec 6 16:39 SmallBlob.html
-rw-r--r-- 1 a3180082 a3180082 8062 Dec 6 16:04 default.php
-rw-r--r-- 1 a3180082 a3180082 3676 Dec 7 13:38 sAppletReader.jar
What is a "FTP directory"? Is that a directory on a FTP server?
Re: Scanning large FTP directories with apache.commons.net.ftp
Hi Norm,
Thanks for the response.
ftpFiles is a list of FTPFile objects as specified here: FTPFile (Commons Net 3.0.1 API)
Unfortunately I can't find anything that returns names and modified dates in one method though. So I'm having to iterate through each file in ftpFiles and examine the date of each.
Can't find a LIST command that doesn't involve me having to loop through a list of files afterwards
Any ideas?
Re: Scanning large FTP directories with apache.commons.net.ftp
The LIST command returned the file that I posted. Does the API have a way to use FTP commands directly?
org.apache.commons.net.ftp
Class FTPCommand
Has the LIST command in its list of commands.