technopleb.eth | by deepinurmom (u/12D3KooWPkNd1ZUrvX7rfLfk7uUgBsrQu8LwHCe9wTCJjvXzrLox) | 4 months ago
Does anyone know anything about getting around cloudflare protection when botting/web scraping? I've tried a few different things on selenium and playwright but haven't gotten anything yet, I'm able to access a page initially at the start of the session but if I try to access another page on the site in the same session I get blocked, I'm trying to get the bot to sign in, any ideas/advice?
2 direct replies
by u/12D3KooWNhcyFvAiEhr6WZzcqhxETyJwt87WwCTWchHAXqAbyrVm | 4 months ago
You can right click a successful request in your browser's network devtools to copy the request and all its parameters (headers, cookies, querystring, user-agent, ...)
start with mimicking the browsers behavior the best you can, then you can narrow down what the deciding parameters are.
I've never needed selenium etc. for the web scraping I do, it's often sufficient to just use fetch() in js or requests in python.
i once even had to bypass a captcha from cloudflare, i just solved it manually and the token was valid for a month or so
by u/12D3KooWFZFJ8NoAE4QZGFbGBHRCKs58dtZ3BrJkVcX7GvdC2hfH | 4 months ago
I think you can set the scraper to pretend to use different browsers and set random delays for going between links. If your scraper goes to the page and then tries to open another link in 69ms it will be classified as a bot