Testing Cloudflare

David

Administrator
Staff member
I used Cloudflare many years ago and had too many problems, but I think now it should work fine. It is live now, and I am not expecting any problems, but let me know if you see anything. One good thing is we are cutting off AI scrapers so they won't take our content so easily. It will also challenge other kinds of bots (ex. spam bots).
 
They probably have already scrapped everything till now from the forum :D

Interesting experiment - what’s the total volume of written salsa information available on the web that is non-SF.

Some of the sites that existed when I joined SF disappeared a very long time ago.
 
They probably have already scrapped everything till now from the forum :D

Interesting experiment - what’s the total volume of written salsa information available on the web that is non-SF.

Some of the sites that existed when I joined SF disappeared a very long time ago.

Definitely there are some sites with really good information, and more organized that SF, but I don't there are any sites with more volume of content than here. In the future, we may be able to use AI to help us organize some of the existing content, or at least make it easier to find things. That said, you can usually do a site search on Google to uncover deep content here on a particular keyword. Definitely bots have scraped a lot already, but I don't worry too much that we can be replaced easily either, and I'm sure there will be some copyright disputes by others that will make it hard for many of the AI services to fully harvest and display everything we have here. At least I hope so.
 
In less than 24 hours there were 3029 requests that were blocked by cloudflare. Here are a couple of examples:

1. Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)

PetalBot is a web crawler that collects information from websites to help with search engine optimization (SEO) and create an index database for search engines. PetalBot was developed by Petal Search and ASK Applications, Inc., a division of IAC Applications, LLC. It's designed to be lightweight on web servers and provide detailed information about each page it visits.

PetalBot crawls PC and mobile websites to create an index database that users can search for content on the Petal search engine. It also provides content recommendations in Huawei Assistant and AI Search services. PetalBot considers whether a web page provides up-to-date information for a given query, and also factors in user interaction, such as how long users spend on search results.

Some say that PetalBot is an aggressive robot that doesn't respect robots.txt files and can be considered a malicious web crawler, or "bot". Malicious bots can be used to steal information, launch attacks, and commit fraud.

2. Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)

Amazonbot is a web crawler used by Amazon to index search results and improve its services, such as enabling Alexa to answer more questions for customers. It helps Amazon gather information from the web to enhance its product recommendations, search results, and overall user experience.

3. AwarioSmartBot/1.0 (+https://awario.com/bots.html; [email protected])


AwarioSmartBot is a web crawler used by the social listening and media monitoring tool Awario. Its primary function is to gather data from the internet, such as social media platforms, blogs, news articles, and forums, to help users track mentions of specific keywords, brands, or competitors.
 
So basically, they are using our resources for free to get information and sell it. Good riddance!

If anyone wants to sponsor, there is a paid tier of cloudflare for $20/month that really takes care of all the bots. But I think this is already a good step.
 
So basically, they are using our resources for free to get information and sell it. Good riddance!

If anyone wants to sponsor, there is a paid tier of cloudflare for $20/month that really takes care of all the bots. But I think this is already a good step.

You have not subscribed to a paid tier? Many hosting plans offer free Cloudflare but not sure what you get for free.

I think there will be soon AI agent or model offerings (there already are some but they are targeting enterprises) that you can own or subscribe to that will work exclusively on your own data.

There a lot of other forums similar to this one with wealth of deep information on different topics. Able to synthesize the knowledge base with a dedicated AI engine with be a boon.
 
I keep my hosting, but Claude flair does a lot of caching and filtering of traffic. Some of the resources involved in loading the page like JavaScript, CSS and images are actually kept at multiple servers across the Internet so that they are served quickly. So when you load this page often times you will be getting assets from cloud flare and the current host. This saves a lot on bandwidth. At the same time cloud flare knows all the bots and filters many of them too.
 
Don't know if that's related, but I've noticed some glitches with alerts (the little bell on top right) in last weeks. Looks like caching issues. Alerts both not appearing when they should have, and appearing after thread is visited already.
 
Back
Top