What Does Robots.txt Do?

December 28th, 2011

Robots.txtThe robots.txt file simply contains instructions for search engine robots on what to do with a particular website. While the search engine robots follow the instructions from that file, spam bots simply ignore it in most cases.

A web robot is a program that checks the content of a web page. If a robot is about to crawl a website, it will first check the robots.txt file for instructions. A command “Disallow”, for example, tells the robot not to visit a given set of pages on this site. Web administrators use this file to restrict the bots to index the content of a particular website for different reasons – they do not want the content to be accessible by other users; the website is under construction, or a certain part of the content must be hidden from the public.

While search engines such as Google use the robots to index web content and can be easily restricted and instructed by the robots.txt, spammers use spambots to reach e-mail addresses, for example, and do not follow the instructions from the robots.txt file. They look for and follow keywords that might be related to an e-mail address such as “post”, “message”, “journal” and so on. What is specific for a spambot is that it comes from many IP addresses and acts as different agents, and thus it can hardly be blocked. Some spambots even use search engines such as Google to look for particular information on a web page.

Fortunately, there are still things that can be done to prevent spambots of scanning your web site and stealing information. Neil Gunton came up with a Spambot Trap which blocks spambots and allows the good search engine spiders to visit your website.

Still, if you would like to leave instructions for the regular search engine bots which pages are to be indexed and which – not, you might want to be careful not to block the search engine completely. If you put in the wrong commands, your website will have no chance of showing up anywhere in search results. If you don’t have a robots.txt file at all, then the web robot will index every single thing that is on your website.

Here’s a list with some ready-to-use basic commands for the robots.txt file:

  • Exclude a file from a certain search engine:
User-Agent: Googlebot
Disallow: /private/privatefile.htm
  • Exclude a section/page from your site from all web robots
User-Agent: *
Disallow: /newsection/
  • Disallow any bot to index any part of your website
User-agent: *
Disallow: /

If you wish to add more complicated instructions, you can follow Thomas Brunt’s instructions.

If you go through your server logs and see a suspicious host, you can run it by our Blacklist Checker. It will tell you if the domain or IP has been blacklisted. If this is the case, then you can simply prevent this host from entering again.

The Importance of the WordPress Expires Header

December 15th, 2011

The importance of “expires header” is growing along with the web page designs which are becoming richer in scripts, images, Flash, etc.

As a result of the growing complexity of web designs, a web page takes longer to load, which is why the site needs an expires header. It simply makes all components such as stylesheets, images and others cacheable or, in other words, prevents unnecessary HTTP requests after the first page view and hence load time is reduced.

The “expires header” needs to have a date set up and it’s important that this date is a future one. The far future Expires header tells the browser how long to have a web page component cached. If a past date is set up then caching would simply not occur. Note that “expire headers” do not affect the load time of the website the first time the user opens it.

Here’s how to add a far future expires header in WordPress:

If the server is Apache, you can use “ExpiresDefault” directive. For example, [ExpiresDefault “access plus 2 months”] means that the expiry date of the file is two months from now.  The time period could be from seconds to years.

In order to add the header, however, you need to add the following code to the .htaccess file:

#Expire Header
<FilesMatch "\.(ico|jpg|jpeg|png|gif|js|css|swf)$">
ExpiresDefault "access plus 2 hours"
</FilesMatch>

or

# Expire images header
ExpiresActive On
ExpiresDefault A0
ExpiresByType image/gif A2592000
ExpiresByType image/png A2592000
ExpiresByType image/jpg A2592000
ExpiresByType image/jpeg A2592000
ExpiresByType image/ico A2592000
ExpiresByType text/css A2592000
ExpiresByType text/javascript A2592000

It’s important to remember that with the “expires header” the files are “saved” in the browser until the expiration date. Thus, you need to use the header on images, Flash and others that will not be changing until the expiry date. If you are, for instance, changing the pictures on the home page on a regular basis, it will not be a good idea to set up an expire header on them. The header will cache them for the period you have selected, and it’ll not be of any use to cache something that is going to change in a shorter period of time.

Here’s an expample:

In the results above we can see that the date is set in the past, which means that the search engines, proxy servers and browsers will always consider the page out of date and try to fetch a fresh copy. This can lead to unnecessary server load. To avoid this problem simply stick to the rule mentioned above and always set a future date.

To check the expiration date of any web page, you can use our free HTTP headers test. It will return the HTTP header (the initial response of a web page, invisible to the end user) where you can find the expiration date.

Forgot to Pay for Your Domain?

November 29th, 2011

Many people who run a business website overlook the importance of the domain name and its expiration date, and they only hear about it when the website is already down. Well, it is crucial to avoid domain name expiration because it could harm both your business and reputation significantly.

If you forget to renew your domain, you will stop receiving e-mails through your website; the website will simply be not there and replaced with ads. The best case scenario is to lose the site, but still be able to register it.

The most unpleasant part, however, is that someone else can take your domain and register it as their own. Imagine how many regular visitors you will lose if you have to register under a different name because someone else took it. If your website administrator doesn’t know the expiration date, he will have to run a lot of checks in order to identify the problem. At first it is not always obvious what happened to the site. Thus, it’s going to take a while until the problem is identified and a lot more time to fix.

So, again, the domain name is probably the most important part of your online presence. It is part of your brand. It is one of the easiest things to be good at online – protecting your domain name.

Here’s a short checklist of actions you can take to protect your online presence:

  • Set up a domain name auto renewal in your billing system on the expiration day. GoDaddy and a lot of other domain registrars offer this service.
  • Register the domain name for years ahead. Dot Com domains can be registered for up to 10 years! This way you will not be worried about having insufficient funds on the day of your auto-renewal because if the payment doesn’t go through, your domain will expire. The risk here is to remember for how long you have prepaid the domain for and when to renew it again.
  • Always update your payment options in your account. If your card expires, your domain won’t be able to auto renew.
  • Ensure that you have an up-to-date e-mail address set up for your domain renewal because you usually receive notifications when the domain is close to expiration. Make sure you have access to it.
  • If you hire a 3rd party to develop your site, make sure they purchase the domain name to your company and have your contact details in the WHOIS records.

If you forget to pay for your domain and your website goes down, that does not ultimately mean that your business is gone. There is a grace period for each domain. You can still get it back if you are quick. One way to know if your domain is down for good is to employ our most basic $5.00/month ping service. If your domain goes down, you will know it within a few seconds.

And last, save your online business by simply watching out for your domain name because neither your services and products or your web design would matter if you site is just not there.

Service Code 503 – Service Unavailable

November 15th, 2011

The service code 503 simply implies that the server is unable to handle a request due to overloading or maintenance of the server. It is important to note that although this condition is temporary and simply causes some delay, some servers may refuse the socket connection, which will result in a different error code.

Here’s what happens when the browser tries to communicate with the web server:

  • The DNS (Domain Name System) looks up the IP address from the IP name of the web site;
  • The browser opens an IP socket connection to that particular IP Address;
  • It writes HTTP data stream through the socket;
  • It then receives an HTTP data stream back from the server, which contains status codes. They are then analyzed.

The 503 status code occurs in the last step described above – the server is functioning at a minimum meaning it does show the status code 503, but the website is completely unavailable. During that time it is expected that the experts are working on fixing the problem. To troubleshoot for this error, you can try the following, as per Microsoft Support:

  • Check if all services are running;
  • Ensure the services are running under the Local System account;
  • Mount the mailbox store and the public folder stores;
  • See if a registry key that exceeds 259 characters exists in the HKEY_CLASSES_ROOT registry hive;
  • Check whether Group Policy object exists – it will prevent the MSExchangeIS service from initializing;
  • Re-register the MDAC Components;
  • Verify the permissions for the HKEY_CLASSES_ROOT registry key.

Most of the time, status code 503 translates to “server not available, please come back in X hours”. Often webmasters return 503 on purpose. It is widely used during scheduled maintenance and website upgrades. This code is used to tell search engine agents that the content is not available, but it would be in a couple of hours (you can set any timeframe) and they should come back and crawl it then. This way webmasters make sure the web page will remain in the search engines’ index. It is also a technique used to take load of the server during peak periods.

One of the things you can do to protect your site from the much unwanted downtime is to monitor your server for free and test your website frequently. There are a lot of good practices to make your website and server work better, but there is nothing like a good remote monitoring service.

DDoS Threat Growing

November 3rd, 2011

Apparently the DDoS threat is growing to a point where it is becoming a major concern for data center managers as firewall products’ efficiency is failing.

The security testing organization NSS Labswhich recently discovered that 3 out of 6 firewall devices stop operating when tested for stability. DDoS has been a major threat for network operators for over ten years since their appearance, but recently these attacks have become more aggressive and have increased in frequency and impact.

DDoS is a “distributed denial of service” and is a violation of the policies of all Internet service providers. The way it works is by sending a great load of requests or ‘attacks’ to the targeted computer. These attacks then force the computer to reset itself or to consume its own resources. As a result, the machine is no longer able to provide its intended service and drops the communication with its user. DDoS targets are mainly sites hosted on high-profile servers such as credit card companies or banks.

When DDoS attacks are successful, they lead to significant outages, OPEX (increased operational expenditures), revenue loss and frustrated customers. Unfortunately, the capacity of security products such as firewalls and IPS is limited and the attackers are well aware of it. They can easily exhaust the application layer resources and cause significant downtime.

According to a recent study conducted by Arbor Networks, the volume of DDoS attacks has reached out 100Gbps barrier or, in other words, the DDoS attacks are growing in number and strength.

In order to reduce risk, specialists suggest that large state-exhaustion attacks must be stopped in the ISP/MSSP because this is where the attacks occur. A packet-based detection and protection against all kinds of DDoS is required as well.

Protect Your Online Presence

September 30th, 2011

Google have recently come up with a new feature called “Authorship markup” which, they say, will connect the author to the particular content in order to give it more credibility.

The Authorship markup encourages quality content by helping out its authors to rank better in the search results, according to Sagar Kamdar, Google Product Manager. For this purpose, the markup connects the web content to a Google Profile of its author and then – back to the particular web page. This way the content shows up in the search results, the author is identified, and the reader even sees a photo of the author displayed alongside, when an image is available. Content then looks more trustworthy and credible, and the website content is more protected.

Google say Authorship markup is quite a new project, and is yet to be improved and simplified. Still, they claim to have made this feature “as easy to implement as possible”. Their first users of this markup have been The New York Times, The Washington Post, CNET and more. Google also claim to have gone even a step further by adding this Authorship markup to everything hosted by YouTube and Blogger. In the future, however, these two platforms will include this feature automatically.

While Google created a feature to protect your website content, WebSitePulse perfected its monitoring service to help you keep an eye on any type of server and network device connected to the Internet, and measure the performance and availability of your websites and applications. Give it a try!

Free Server Monitoring for Life

September 2nd, 2011

Someone once said that there is no such thing as small business. We believe this is true. Due to their nature, not all online businesses require the highest level of service we can offer. If your site is mostly driven by word of mouth and you have a strong offline presence, we might have just the thing for you. A free server monitoring service for life!

If your business depends on the company website to keep your clients informed about changes, share your office location and provide some basic interaction, we can offer you a free server monitoring service for life. We will check your server round the clock and inform you if it goes down.

While being one of our entry-level services, we assure you that it employs all the know-how we have gathered in the last decade. You will get free email notification each time we detect that your server is not running. We only have one simple rule – One account per customer.

We will monitor whether the server running behind your website is working properly. It makes no difference if you host your website on a free hosting provider, an in-house hosting solution or shared hosting. You will be able to choose one monitoring location. This is more than enough if your business is strictly local. If you provide service to people living in San Francisco, then your website really needs to be visible mostly to people in San Francisco.

We will not forward your details to third parties, and you will receive no emails from such. You can’t go wrong with a free service, can you? We won’t display any ads too. We would rather like it that you are happy with our service in the long run. So, let’s break it down.

We offer you:

 

What we won’t do is:

  • Let your server go offline unnoticed
  • Flood your inbox with emails about service upgrades options. (We might contact you every now and then with updates about services and new prices for regular service.)
  • Share your details with third parties.
  • Display advertisements of any kind

 

Our free lifetime monitoring is suitable for:

  • Local business owners
  • Hobby & personal websites
  • Small websites with informational purpose
  • Anyone not too sure to go ahead with the full service.
  • Anyone who would like to test our service

If you like to see how the system works before you register for the free service, then go right ahead and log in with our Demo account. Happy monitoring!

Are You Looking at the Right Metrics?

August 17th, 2011

The most dangeours type of downtime is the one you don’t know about. It is discturbing how true that one line actually is. Should it occur, website/server downtime can and will cause problems and ripples throughout your organization. Before we get tarred and feathered for making such a bold statement, let us build our case.

The Problem

In this realworld situation, a business lost roughly 30% of their leads for July. Apart from their initial loss, they simply handed out a good portion of the market to their competitors, in high season. When the figures arrived, all hell broke loose. All major markets felt the downturn. In search for a logical explanation, hours of daytime were invested in finding the reason. After it was made clear that the traffic was stable, the management went on to search for answers somewhere down the line. The marketing team had to pull out detailed reports for their activities in the last three months. Seasonal sales people got numerous tests calls. A full-scale internal audit took place. This caused a ripple effect and the normal workflow was seriously disrupted.

Locating the Issue

Upon request, the IT department emailed external statistics on the webserver’s uptime. They had employed the services of a website monitoring service (not ours). According to their information, the server only went down for 20 minutes that month during the scheduled maintenance. What they failed to notice is that the service they used only gave them figures of the network availability of the hardware device, not the server software. The machine was available nearly all the time, but was doing what it was supposed to (serving web pages) only ~80% of the time. We were able to find that out only after we began tracking the server ourselves.

We were able to locate the problem, because the service we chose to test with, actually tested the website itself. We tried loading all major application forms from multiple locations over a given period of time. It wasn’t long before we got the first alert about a page not loading. It turned out that the server was failing to deliver the pages after a certain number of concurent connections. With some modest server upgrades and clever workarounds by the IT department, all website returned to normal. Simply the hardware couldn’t take the load and the server software decided to drop a number of queries in order to serve the rest.

The website service, employed by the business, worked exactly as it should. What was referred to as website uptime was actually server uptime.

Quick Tips

  • Network availability is only a prerequsite for a website to function properly. Even if the site loads sucessfuly it is not clear if the forms on the site will be 100% functional.
  • One good sign to look for, when trying to find the exact cause for problems with your traffic and conversion rate, are the traffic sources. If you notice significant decrease accross all mediums, then it is most likely that your website is not performing as expected.
  • Make sure you are using the service you need. You can test our range of website and server monitoring services completely free.

Submarine Communication Lines

July 7th, 2011

There are currently 121 existing submarine communication cable systems with 25 more planned for the next few years. That sounds like a lot but actually they are way more than this. 121 is the number of all currently functioning underwater communication cables.

The first operational submarine cable was laid down in 1851. It connected Great Britain and France. The next notable cable system was the transatlantic copper cable, laid between Newfoundland and Ireland. Since then, hundreds of communication cables have been laid on the ocean floor. FLAG, or Fiber optic Link Around the Globe, is one of the most noticeable submarine cable systems. It is 28,000 kilometers long and it is comprised of 4 segments. Its Europe-Asia segment is by far the fourth longest cable in the world. In order for the technology to work erbium-doped fiber amplifiers are installed every 50 kilometers.

Technology came a long way since the pioneer days of copper cables with hemp and tar insulation. Since then, each successive technological advancement has reduced the overall costs and improved the quality and availability of international cable systems. The cost continues to go down while capacities continue to meet the growing bandwidth needs. The increasing demand is generated mostly by growing need for faster data transfer. In comparison, voice service doesn’t change in demand a lot. But what happens to obsolete systems? It is not uncommon for obsolete underwater lines, which are no longer used for commercial purposes, to be used for scientific studies.

Submarine cables take amazingly little time to repair. When detected, a cable failure can be fixed in about 8 hours! The technology allowing cable problem detection is called Brillouin Optical Time Domain Reflectometry. This technology provides very high precision when detecting a cable fault. The video below visualizes how to repair operations are carried out.

Repairs begin with failure detection. The sooner you know where the problem is, the sooner you will be able to remove it. Find out if your sites are not performing as expected in certain geographical locations. Try 30 Day Free Website Monitoring.

SSD vs. HDD for Business

July 6th, 2011

Is SSD the solution for the ever widening gap between current hard drive technology and CPU advancements? Is this the next step in data storage and are SSD drives here to stay? How reliable are they? Are they actually worth it?

This is just a handful of questions from a huge, huge batch. SSD are still pretty expensive for everyday use. They are still the domain of computer enthusiasts and early adopters. Leaving the money question aside, let’s check whether they are a good solution for business workstations and high-end server hardware.

While SSD drives might not be the best choice for personal computing, they might be great for server hardware. In an average laptop, you might be better off with a 5400rpm drive. Most users don’t see any battery life improvement. In fact, the 5400rpm drives can drain the battery even less than a SSD drive. Unlike the SSD drive, HDD can actually spin down and actually reduce the battery drain to a minimum. SSD drives might use more power when running idle, but that is only an issue for personal computers and laptops. Server hard drives rarely stay idle.

SSDs are in fact great for database servers. Most requests are extremely small in size and are often random in nature. With SSD drives there is no mechanical latency to limit the performance. They offer great speed improvements even over 15000rpm drives. The lack of moving parts reduces the heat coming from the drive, thus requiring less power to cool down a server rack.

Many data centers cut costs from cooling. This way they stress their hardware more and need to replace it more often, but it pays off when you look at the power bill. You can have the best of both worlds with SSD. They generate almost no heat at all. Claims of 50% lower electricity bill might not be too farfetched, when you consider the less power required to cool down the server racks.

The life expectancy of a SSD drive is said to extend to 50 years, which is pretty hard to believe and most likely not applicable to servers. It must be somewhat close. Unfortunately, SSDs as we know it have been around only for a couple of years, so no one really knows. The life cycle is limited by the number of write cycles. This is why there are a lot of server applications for SSDs where information is only to be read from them.

Let’s not forget they do the job faster. This means that they complete tasks faster than traditional storage devices. Ideally, this could reduce the amount of disks required in an installation. This is highly unlikely before larger SSDs become available, but it is one of those features that will make a difference once the technology improves.

You get higher performance, high reliability, power savings, more than a reasonable lifespan, and a hefty price tag. Depending on the scale of implementation, the last one might not be true too, considering the lower power bill.

If you plan to upgrade your installation it might be wise to wait for a while. Prices are said to go down by 50% by the end of the year. Early adopters, who have chosen to use SSDs in their web and database servers rarely complain and speed is never the issue.