We have many VM instances in Compute Engine used to scrape, they can get blocked in some sites and then we try to change the IP using NordVPN. We are trying to create a Python script to automate the IP change when we detect we’re blocked. Currently, we are using this Python package that we recently found: NordVPN-switcher, but we are getting the next error:
Connecting you to Denver ...
An unknown error occurred while connecting to a different server!
An unknown error occurred while connecting to a different server! Retrying with a different server...
Traceback (most recent call last):
File "demo.py", line 13, in <module>
rotate_VPN(instructions) # refer to the instructions variable here
File "/home/eduardo_santos_housecallprosolut/.local/lib/python3.8/site-packages/nordvpn_switcher/nordvpn_switch.py", line 514, in rotate_VPN
raise Exception("Unable to connect to a new server. Please check your internet connection.n")
Exception: Unable to connect to a new server. Please check your internet connection.
Note: We have an internet connection.
VM instances also have NordVPN installed, if we try manually we can change it, but as we are connected to the instance using SSH, at the moment we change the IP the connection is lost.
Then, the current problems are:
- How to dynamically change the IP of an instance properly?
- How to keep a connection after the change occurs.
Note: The scrapers and all the logic is dockerized, and the Python version is 3.9
As I mentioned at the beginning, we have many machines used for scrape, we would like to keep a registry of the IPs used in each one in order to have a better assignation, probably using a Redis DB o a small collection in MongoDB. What do you think about it? What is a good way to de develop this?
Thank you so much.
2
Answers
There is no supported method. Any existing connections will break/fail once the IP address changes. Software that uses IP will need to be written to handle connection failures and attempt to reconnect. This type of feature is common with cell phone applications but less so in the desktop/server world.
An important point with Google Cloud (and most of the cloud vendors) is that your VM does not have a public IP address assigned to a network interface. The public IP address is assigned to one side of a one-to-one NAT. This means IP address change notifications within the OS and applications will not happen.
Google provides a CLI, SDKs and APIs that can be used to programmatically change the IP address assigned to an instance.
Two strategies:
Add another network interface with a public IP address that does not change. Connect to the VM using that IP address.
Create a pool of public IP address that you will use. Use a VPN such as WireGuard which has excellent features for following connection address changes. Connect via the VPN using the VM’s private IP address which does not change when the public IP address is changed.
I would use the first strategy as that has less complexity and fewer potential problems. However, once you understand how WireGuard manages connections and implements signatures instead of IP addresses, there are numerous possibilities for connection management.
I tried this tonight on a VM with a public IP, and then I remove the public IP and it continued to work. It could be the solution!
You can use IAP to connect to your vm. Do it in a terminal with gcloud like this:
Let me know. If it doesn’t work for you, I will delete the answer