I’m trying to make a script that downloads all the Google search images for making dataset of my ml project. I was following this tutorial to download the high-resolution image but suddenly an error appears which says:
Refused to load the script ‘https://ajax.googleapis.com/ajax/libs/jquery/2.2.0/jquery.min.js‘ because it violates the following Content Security Policy directive: “script-src ‘report-sample’ ‘nonce-Q6xQOKx7e+e0TlGbQFPX3g’ ‘unsafe-inline'”. Note that ‘script-src-elem’ was not explicitly set, so ‘script-src’ is used as a fallback
Some help would be greatly appreciated. I run this code by pasting it into the javascript console. Thanks!
var script = document.createElement('script');
script.src = "https://ajax.googleapis.com/ajax/libs/jquery/2.2.0/jquery.min.js";
document.getElementsByTagName('head')[0].appendChild(script);
// grab the URLs
var urls = $('.rg_di .rg_meta').map(function() {
return JSON.parse($(this).text()).ou;
});
// write the URls to file (one per line)
var textToSave = urls.toArray().join('n');
var hiddenElement = document.createElement('a');
hiddenElement.href = 'data:attachment/text,' + encodeURI(textToSave);
hiddenElement.target = '_blank';
hiddenElement.download = 'urls.txt';
hiddenElement.click();
3
Answers
I think you need to add something like this:
Add it to Policies, there are many different ways (see the docs).
Refused to load the script is because Content Security Policy. In Firefox you can disable csp via about:config in url bar and set security.csp.enable to false.
I tried for testing with the code below in the Firefox console:
Good luck 🙂
You are using jQuery for something that can be done in native javascript.
document.querySelectorAll works with selectors mainly as jQuery does. It does not return an array, but an (in my opinion) unwieldy NodeList.
To get it to iterate properly, I prefer to spread it into an array and then call forEach on it.
Also, the method of getting the data is diffent currently.
On all the images you need to trigger a click first.
This will activate javascript event handlers that will set the href of the image grandparent.
You need let the google event handlers run first, so we detach the rest of our execution flow so the google script can do it’s thing and update the DOM. We do this with setTimeout().
Then when the google scripts have run, the DOM elements have been updated, our scheduled timeouts get a chance to run, and now the href’s have been populated.
Before the click the link looks like this:
after click
we now see that the href has been populated. The url that has been entered is:
In this url we see after
imgurl=
something starting with https. This is our target image url, but it has been urlencoded and is part of a larger url.So we manipulate the string with some simple substring manipulation.
Then we still have strange characters
https%3A%2F%2Fwww.researchgate.net%2Fprofile%2FJerome_Droniou%2Fpublication%2F305983658%2Ffigure%2Ffig5%2FAS%3A668650201690119%401536430039650%2FMesh-patterns-for-the-tests-using-the-HMM-method-left-Test-1-right-Test-2.png
for that we can use decodeURIComponent() to transform it into a normal url
We then add this to our array.
When we’ve handled everything, we create the urls file and download it.