Error with a scheduled import from local folder (automatically fetch the latest file)

Closed
ImportWP Pro - WordPress XML & CSV Importer ImportWP Pro - WordPress XML & CSV Importer April 24, 2021
Login to reply
Marco PIERRARD
3 years ago

Hello,

I've changed the time of the scheduled import but this time without manually clicking on the "download" or "download and continue " buttons and you right the import work perfectly without skipping the first file.

It's OK for me.

Thank you very much for your support in helping setting this up.

Regards

James Collings Support Agent
3 years ago

The first file will only be skipped if you manually click the download, or download and continue button on the Select file step. As this will trigger the file to be selected and moved to archive folder.

The changes to the cron manager will be released in the next patch release.     

Marco PIERRARD
3 years ago

Hello,

UPDATE : today the import ran again and this time the three files where imported as expected (see secreenshot).

So for me the code is OK that way, I'll just put a false xml file the first time I put the scheduled import in place.

Can you confirm me that the change in the code of this dev version will be include in the next version of WP import Pro so i can update the plugin without losing the import capabilities of the specific ONIX plugin?

Thanks again

Regards

Marco PIERRARD
3 years ago

Hello,

I've tried the new 2.2.1-dev2 version by importing three new files and the first one (the oldest inside the folder) has not been imported but as you said the error was not put inside the importer history.

Only the second and third files have been imported (see screenshot). So it's the same behaviour than the precedent version the only difference is that the error is not logged.

As I said before if it's only happening when you schedule the import it doesn't really matter, I can put a false xml file at the beginning. But if it's skiping the first file every day it's an issue.

I can only confirm tomorrow if it will skip again the first file as the import scheduled it's actually running now.

For information i didn't use the WP Control plugin because I thought your new code make it useless but perhaps i'm wrong I can do it if you think it will fix the problem.

Thank you

James Collings Support Agent commented privately
James Collings Support Agent
3 years ago
After looking at my local copy, if the child importer errors, and cant find a file it is causing a constant recheck, i will fix this and send you an update.
James Collings Support Agent
3 years ago

Are you able to install this plugin: https://wordpress.org/plugins/wp-crontrol/ then go to Tools > Cron Events, look for the hook "iwp_schedule_runner", with the arguments [ your_importer_id, 'multiple'], and remove all events that have these matching arguments.

Then put your files back into the folder, then wait for the main importers schedule to run.

The error_first_file, suggests that it ran and could not find a file in your source folder to import.

The fact that it started before the scheduled time suggests that you have some 'iwp_schedule_runner' events running that shouldnt be (hopefulyl fixed by the Tools > Cron Events previously mentioned), as the main importer has to run before it will spawn any child schedules.

Marco PIERRARD
3 years ago

Sorry I forgot the screenshots in my message.

Marco PIERRARD
3 years ago

Hello,

Thank you for this new code.

I've just tested it and it work almost as needed.
There is a problem with the fact that the import start before the time sheduled and the first file imported in it return and error.

But perhaps it's a normal behaviour and that's what you mean when you said "Please not that you have to set the importer as a scheduled importer for this entire process to work".

This what the test I've done, I put three files inside the folder, from oldest to latest (by date of modification) :
- test_book_parascolaire_old.xml
- test_book_parascolaire_15h10.xml
- test_book_parascolaire_15h11.xml

And I sheduled the import to run at 15h20 but the moment I hit the "Save & schedule" button, the import ran (at 15h07) and tried to import the first file (test_book_parascolaire_old.xml) and return an error (see screenshot).

The data from the first file were not imported but the file was put inside the archive folder with the timecode as expected (see screenshot).

Then, it worked well.
At 15h20 the scheduled import ran and imported the data from the second file (test_book_parascolaire_15h10.xml) and put it in the archive folder.
One minute later, the import ran again and imported the last file (test_book_parascolaire_15h11.xml) and move it to the archive folder too.

Then the import check the folder and without anymore file in it, the next import was scheduled for tomorrow.

So it work perfectly except for that first file when the schedule is activate.
Do you think it can be change?
If not it's really not a important bug as I can put a xml file with dummy data to be process at first and then it will work as expected.

Thank you

Regards

James Collings Support Agent commented privately
Marco PIERRARD
3 years ago

Hello,

Thank you for your answer.

In fact I was thinking of the archive folder just because I need a file to be put out the folder after being process so the script goes on with the next file (if it exists). I don't really need archive so the script can delete the file after import.

As you said the imported files are stored and I can choose how many are keep so I don't need that archive folder after all.

So the behaviour would be : fetch the OLDEST file (as i said in my precedent message, I need now to start with the oldest file not the latest because I can have more than two files each day and if I start by the latest most recent info could be lost) and delete it after import in the folder and then the script run again to fetch the next oldest file and so on till there is no more files left in the folder.

Thank you

Regards

James Collings Support Agent
3 years ago

Yes this does sound possible, the only question that i have is about the archived files, when an importer downloads / copies the file from a remote / local source it is down, we need to make sure it is removed from the source directory (this should be possible if it is a local datasource like you mention via custom code) , by default the importer keeps the last 5 files imported, these are accessed via the existing files section on the select file tab, is this enough or do you need the archive folder that was suggested? we can obviously change the amount of files kept once they have been imported or make it save all files this will be up to you via a setting i have added into the settings panel (previously hard coded value).

I have fixed the custom code that i have previously written, i had just forgotten to update the cron script with that filter.

I will get back to you with an updated script that should import the latest file within a directory, and once that is complete, it will move on to the next most recent untill all files have been imported (this will only work for scheduled imports as i plan to trigger the import of the next file by adding in a scheduled task to run right after if it exists).

Marco PIERRARD
3 years ago

Hello,

UPDATE : finally, I would possibly have more than two files per day inside the folder so I need to ajust the way they are imported. As I won't know how many files will be inside the folder, I can't schedule x runs of the import. I need one import each day but I need to run as long as there is files inside the folder and this time starting with the oldest one because a newer one can have new information for create or update an entry.

So the import will process the oldest file in thefolder, move it inside an archive folder than run again and so one until there is no file left.

Do you think it can be done that way?

Regards

Thanks

Marco PIERRARD
3 years ago

Hello,

I need my import to work this way : each day, i got two files coming inside a local folder on my server.
I need the import to run at a certain time and import the lastest file in the folder, then move it inside a archive folder. Then later (let's say five minutes after), I run this import again to get the second file.

For that, I'm using this code you sent me :

/** 
* Expore the ability to import the latest file in a directory.
*/

function iwp_onix_listdir_by_date($path)
{

$ar = [];
chdir($path);
array_multisort(array_map('filemtime', ($files = glob("*.*"))), SORT_DESC, $files);
foreach ($files as $filename) {
$ar[] = $filename;
}

return $ar;
}

/**
* Import the latest file in the directory
*
* @param string $source
* @param string $raw_source
* @param ImportWP\Common\Model\ImporterModel $importer_model
* @return void
*/
function iwp_onix_import_latest_file($source, $raw_source, $importer_model)
{
if ($importer_model->getId() != 212984) {
return $source;
}

$files = iwp_onix_listdir_by_date($raw_source);
if (!empty($files)) {
$file = $files[0];
if (file_exists($file)) {
// return
return trailingslashit($raw_source) . $file;
}
}

return $source;
}

add_filter('iwp/importer/datasource', 'iwp_onix_import_latest_file', 10, 3);

I've tried this code and it is working great when I manually do the process from select file (and download the file) to run import but when I try to scheduled it i got this error "Unable to determine filetype" (see screenshot).

As I don't give the full path to the file in the "local file" field on step 1 (because I can't my files have names with timecodes that are never the same and impossible to guess) but just the path to the folder when the import run on scheduled I think it "forgets" to get the file, it just read what's in the "Local file" field and as it's just a path to a folder...

Is there a way to change this code to make it work with the schedule option?

And what would be the bit of code to move the file inside an archive folder after it's imported?

Thank you

Regard