Deprecated: Methods with the same name as their class will not be constructors in a future version of PHP; DB_POSTS has a deprecated constructor in /home/notyced/public_html/blog/admin/kernel/db/db_posts.class.php on line 12

Deprecated: Methods with the same name as their class will not be constructors in a future version of PHP; DB_PAGES has a deprecated constructor in /home/notyced/public_html/blog/admin/kernel/db/db_pages.class.php on line 12

Deprecated: Methods with the same name as their class will not be constructors in a future version of PHP; DB_TAGS has a deprecated constructor in /home/notyced/public_html/blog/admin/kernel/db/db_tags.class.php on line 12

Deprecated: Methods with the same name as their class will not be constructors in a future version of PHP; DB_CATEGORIES has a deprecated constructor in /home/notyced/public_html/blog/admin/kernel/db/db_categories.class.php on line 12

Deprecated: Methods with the same name as their class will not be constructors in a future version of PHP; DB_SETTINGS has a deprecated constructor in /home/notyced/public_html/blog/admin/kernel/db/db_settings.class.php on line 4

Deprecated: Methods with the same name as their class will not be constructors in a future version of PHP; DB_COMMENTS has a deprecated constructor in /home/notyced/public_html/blog/admin/kernel/db/db_comments.class.php on line 12

Deprecated: Methods with the same name as their class will not be constructors in a future version of PHP; DB_NOTIFICATIONS has a deprecated constructor in /home/notyced/public_html/blog/admin/kernel/db/db_notifications.class.php on line 12

Deprecated: Methods with the same name as their class will not be constructors in a future version of PHP; DB_USERS has a deprecated constructor in /home/notyced/public_html/blog/admin/kernel/db/db_users.class.php on line 8

Deprecated: Methods with the same name as their class will not be constructors in a future version of PHP; Login has a deprecated constructor in /home/notyced/public_html/blog/admin/kernel/api/login.class.php on line 4
Notyced Blog -

Notyced

How to Use Robots.txt ?

Aug 232017

What Is Robots.txt?

A robots.txt file is a text file stored in a website’s root directory that gives web crawlers directions regarding which pages, folders and/or file types they should or shouldn’t access to crawl and index. These instructions can include all bots, or provide guidance to specific user-agents. Robots.txt files use the Robots Exclusion Protocol developed in 1994 as a protocol for websites communicating with crawlers and other internet bots.

When website owners wants to tell bots how to crawl their sites, they load the robots.txt file in their root directory, e.g. https://www.example.com/robots.txt. Crawlers arriving on the site will fetch and read the file before trying to fetch any other file from the server. When a website doesn’t have a robots.txt file, or the crawler can’t load it for some reason, a bot will assume the site owner doesn’t have any instructions to give it.

When creating a robots.txt file, it’s absolutely vital to do so using a plain text file. Using HTML or a word processor will include code in the file that crawlers can’t read. This could cause them to ignore directives in the file.

How Does a Robots.txt File Work?

A robots.txt file is made up of blocks of code containing two basic parts: user-agent and directives.

Robots.txt User-Agent

User-agent refers to the name used by a web crawler. When a crawler arrives on a site and opens its robots.txt file, the bot will look for its name in one of the user-agent lines.

Using the user-agent part of robots.txt is relatively simple. User-agent must always be listed before the disallow lines and each user-agent line can specify only one bot (Sort of. More on that in a bit). So, for example, if you have a page you don’t want Google to crawl for some reason, but you’re ok with Bing or Baidu, you’d write your instructions like this:

User-agent: googlebot
Disallow: https://www.example.com/page

That would tell Google’s web crawler not to open the page at example.com/page, while the user-agents for other search engines would continue unaffected. If you want give the same instructions to more than one user-agent, you have to create a set of directives for each one:

User-agent: googlebot
Disallow: https://www.example.com/page

User-agent: Bingbot
Disallow: https://www.example.com/page

That robots.txt file would tell Google and Bing not to crawl the page at https://www.example.com/page while other bots such as Baidu or Yandex would continue to do so.

If you want to provide directives to all web crawlers that access your site, you can use what’s called a wildcard. Wildcards are represented as an asterisk (*) and represent any character or string of characters. So in a robots.txt like this:

User-agent: *
Disallow: https://www/example.com/page

Bots that read the robots.txt file will automatically interpret the wildcard as their own user-agent.

These days, most search engines have multiple crawlers to do different things like crawl images, ads, videos or mobile content. In instances where a crawler encounters a robots.txt file that doesn’t specifically include its user-agent, it will follow the instructions for the most specific user-agent that is relevant to them. So, for example, if Googlebot-Images opens a robots.txt file directives for Googlebot, Bingbot and a wildcard, it will follow the disallow lines for Googlebot since that is the most specific set of instructions that could apply to Googlebot-Images.

This is very important to keep in mind while writing a robots.txt file so you don’t accidentally block the wrong user-agents.

Here are the most common search engines, their user-agents and what they search for:

User-Agent Search Engine Field
baiduspider Baidu General
baiduspider-image Baidu Images
baiduspider-mobile Baidu Mobile
baiduspider-news Baidu News
baiduspider-video Baidu Video
bingbot Bing General
msnbot Bing General
msnbot-media Bing Images & Video
adidxbot Bing Ads
Googlebot Google General
Googlebot-Image Google Images
Googlebot-Mobile Google Mobile
Googlebot-News Google News
Googlebot-Video Google Video
Mediapartners-Google Google AdSense
AdsBot-Google Google AdWords
slurp Yahoo! General
yandex Yandex General

Robots.txt Disallow

The second part of robots.txt is the directive, or disallow, lines. This is the part of the code that controls what pages, folders or file types a user-agent shouldn’t crawl. These lines are usually called ‘disallow’ lines because that’s the most common directive used in robots.txt for SEO.

Technically, you don’t have to put anything in a disallow line; bots will interpret a blank line to mean they’re allowed to crawl the entire site. To block your whole server, use a slash (/) in the disallow line. Otherwise, create a new line for every folder, subfolder or page you don’t want to get crawled. Robots.txt file use relative linking, so you don’t have to include your whole domain in every line. However, you have to use the canonical version of your URLs that match the URL structures in your sitemap

Take this block of robots.txt code as an example:

User-agent: *
Disallow: /folder/subfolder/page.html
Disallow: /subfolder2/
Disallow: /folder2/

The first disallow line stops all bots (note the wildcard in the user-agent line) from crawling the page https://www.example.com/folder/subfolder/page.html. Since the command specifies the page.html file, the bots will still crawl other pages in that folder, as well as any instances of page.html in other directories. The second line, on the other hand, disallows the entire /subfolder2/ subdirectory, which means any page found in that folder shouldn’t be crawled. However, pages found in a /subfolder3/ directory could still be crawled and indexed. Finally, the third line instructs bots to skip all directories and files found within the /folder2/ directory.

Using your robots.txt file to disallow specific files or folders is the simplest, is the most basic way to use it. However, you can get more precise and efficient in your code by making use of the wildcard in the disallow lines.

Remember, the asterisk works like an eight card in Crazy Eights: it can represent any string of characters. For disallow, that means you can use a wildcard as a stand-in for any file or folder name to control how bots crawl the site. Here is the wildcard in action:

User-agent: *
Disallow: /*.pdf
Disallow: /images/*.jpg
Disallow: /copies/duplicatepage*.html

The wildcard is very useful here as these commands tell all user-agents not to crawl PDFs anywhere on the site or jpeg files in the ‘images’ file. The third line stops bots from crawling any file in the ‘copies’ folder that contains ‘duplicatepage’ and ‘.html’. So if your site uses URL parameters for analytics, remarketing or sorting, search engines won’t crawl the duplicate URLs such as:

  • /copies/duplicatepage1.html
  • /copies/duplicatepage2.html
  • /copies/duplicatepage.html?parameter=1234

Note that search engine crawlers are just looking for URLs that contain the exclusion parameters. They aren’t looking for direct matches, which is why that last example would be disallowed.

In the example above, a file at ‘/copies/duplicatepage/page.html’ would also be disallowed as the wildcard would expand to become the ‘/page’ part.

Using the rules above, there could be instances of pages unintentionally matching exclusion rules, such as when an excluded file extension is used in the file name, an HTML page called ‘how-to-create-a-.pdf’ for example. Resolve this by adding a dollar sign ($) to tell search engines to exclude only pages that end in the same way as the disallow line. So Disallow: /copies/duplicatepage*.html$ will exclude only HTML files that contain ‘duplicatepage’.

Non-Standard Robots.txt Directives

Disallow is the standard directive recognized by all search engine crawlers (it is the Robots Exclusion Protocol). However, there are other, lesser-known directives recognized by web crawlers.

Allow

If you want to disallow an entire folder except for one page using just the disallow command you would have to write a line for every page except the one you want crawled. Alternatively, use a disallow line to block the entire folder and then add an ‘Allow’ line specifying only the single page you want crawled. Allow works in much the same way Disallow, meaning it goes below the User-agent line:

User-agent: *
Disallow: /folder/subfolder/
Allow: /folder/subfolder/page.html

Wildcards and matching rules work the same way with allow as with disallow. Allow is recognized by both Google and Bing.

Other Commands

There are a few other non-standard directives recognized by web crawlers that you can use to further influence they way your site is crawled:

  • crawl-delay: This line uses a numerical value that specifies a number of seconds. It’s recognized by Bing and Yandex but used differently by each. Bing will wait the specified number of seconds before completing its next crawl action while Yandex wait that number of seconds between reading the robots.txt file and actually crawling the site. This number will limit the number of pages on your site that get crawled, so it’s not really recommended unless you get almost no traffic from those sources and need to save bandwidth.

  • Host: This is only recognized by Yandex and works as a WWW resolve, telling the search engine which is the canonical version of the domain. However, since Yandex is the only search engine and uses it, it’s not recommended to use it. Instead, set your preferred domain in Google Search Console and Bing Webmaster Tools and then set a 301 redirect to implement a WWW resolve.

Finally, while not really a command, you can use the robots.txt file to link to your XML sitemap via the Sitemap: line. This line is interpreted independently of user-agent, so add it at the start or end of the file. If you have multiple sitemaps, such as image and/or video sitemaps, include a line for each, along with a line for your sitemap index file.

How Do I Use Robots.txt for SEO?

If SEO’s objective is for your site to get crawled and indexed in order to rank in search results, why would you want to block pages? The fact is, there’s a few situations in which you wouldn’t want content to be crawled or appear in search results:

  • Disallowing unimportant folders or pages will help the bots use their crawl budgets more efficiently. Think about it: every second they’re not crawling your temp files is a second they can spend crawling a product page. Adding the Sitemap: line will also help search engines access your sitemap more easily and efficiently.

  • As discussed above, sometimes duplicate and/or thin content is unavoidable. Disallow those pages with your robots.txt to help your website stay on the right side of Panda.

  • Disallow user agents from search engines that operate in countries you don’t. If you don’t/can’t ship to Russia or China, it might not make sense to have Yandex and Baidu (the two most popular search engines in those countries, respectively) using bandwidth by crawling your site.

  • You have private pages you don’t want to appear in search results. Remember, though, that robots.txt files are public, so anyone can open it and see these pages. Plus robots.txt doesn’t stop direct traffic or people following links.

  • When redesigning or migrating a site, it’s a good idea to disallow the entire server until you’re ready to add redirects to your legacy site. This will prevent search engines from crawling your site before you’re ready, making it look like content copied from your old site. Incurring this ‘penalty’ at the launch of your site is not a good way to start.

When using your robots.txt file during a site migration, be extra sure that you update the file when setting your new site live. This is a common mistake and one of the first things you should look at when trying to diagnose a loss of search traffic and/or ranking drop.

Before uploading your robots.txt file, run it through Google’s robots.txt Tester in Google Search Console. To test your file, copy and paste your code into the tester; syntax and logic errors will be highlighted immediately. Once you fix those, test individual URLs you know should be blocked and allowed to see if your robots.txt is correct.

Note that, naturally, Google’s robots.txt tester only applies to Googlebot. To verify that your file works for Bing, use the Fetch as Bingbot feature in Bing Webmaster Tools.

 

What is Search Engine Optimization (SEO)?

Aug 232017

Beginner's Guide to SEO

New to SEO? Need to polish up your knowledge? The Beginner's Guide to SEO has been read over 3 million times and provides comprehensive information you need to get on the road to professional quality Search Engine Optimization, or SEO.

What is Search Engine Optimization (SEO)?

SEO is a marketing discipline focused on growing visibility in organic (non-paid) search engine results. SEO encompasses both the technical and creative elements required to improve rankings, drive traffic, and increase awareness in search engines. There are many aspects to SEO, from the words on your page to the way other sites link to you on the web. Sometimes SEO is simply a matter of making sure your site is structured in a way that search engines understand.

SEO isn't just about building search engine-friendly websites. It's about making your site better for people too. At Notyced we believe these principles go hand-in-hand.

This guide is designed to describe all areas of SEO—from finding the terms and phrases (keywords) that generate traffic to your website, to making your site friendly to search engines, to building links and marketing the unique value of your site. If you are confused about this stuff, you are not alone, and we're here to help.

Search Engine Market Share

Why does my website need SEO?

The majority of web traffic is driven by the major commercial search engines, Google, Bing, and Yahoo!. Although social media and other types of traffic can generate visits to your website, search engines are the primary method of navigation for most Internet users. This is true whether your site provides content, services, products, information, or just about anything else.

Search engines are unique in that they provide targeted traffic—people looking for what you offer. Search engines are the roadways that make this happen. If search engines cannot find your site, or add your content to their databases, you miss out on incredible opportunities to drive traffic to your site.

Search queries—the words that users type into the search box—carry extraordinary value. Experience has shown that search engine traffic can make (or break) an organization's success. Targeted traffic to a website can provide publicity, revenue, and exposure like no other channel of marketing. Investing in SEO can have an exceptional rate of return compared to other types of marketing and promotion.

Why can't the search engines figure out my site without SEO?

Search engines are smart, but they still need help. The major engines are always working to improve their technology to crawl the web more deeply and return better results to users. However, there is a limit to how search engines can operate. Whereas the right SEO can net you thousands of visitors and increased attention, the wrong moves can hide or bury your site deep in the search results where visibility is minimal.

In addition to making content available to search engines, SEO also helps boost rankings so that content will be placed where searchers will more readily find it. The Internet is becoming increasingly competitive, and those companies who perform SEO will have a decided advantage in visitors and customers.

Can I do SEO for myself?

The world of SEO is complex, but most people can easily understand the basics. Even a small amount of knowledge can make a big difference. Free SEO education is widely available on the web, including in guides like this. Combine this with a little practice and you are well on your way to becoming a guru.

Depending on your time commitment, your willingness to learn, and the complexity of your website(s), you may decide you need an expert to handle things for you. Firms that practice SEO can vary; some have a highly specialized focus, while others take a broader and more general approach.

In any case, it's good to have a firm grasp of the core concepts.

How much of this article do I need to read?

If you are serious about improving search traffic and are unfamiliar with SEO, we recommend reading this guide front-to-back. We've tried to make it as concise as possible and easy to understand. There's a printable PDF version for those who'd prefer, and dozens of linked-to resources on other sites and pages that are also worthy of your attention.

 

Each section of this guide is important to understanding the most effective practices of search engine optimization.

Business Marketing Strategy

Aug 182017

We recently delivered a presentation to the sales team of a large marketing firm. Our goal was to teach them to stop marketing to their clients, and to start building relationships with their strategic partners. With the right strategy, they could easily double their income and work less hours at the same time. This concept can apply to any business.

Every business and non-profit has many strategic partners and they probably are not a client. A strategic partner is someone well connected to many of your potential clients. Once they trust you, a strategic partner can send you steady streams of referral business on a consistent basis.

undefinedTo identify who your strategic partners are, first think of your clients. Who are they and what are they doing when they need your service? What other services do they need at that same time?

For example, one of the target markets for the marketing firm was “medium sized businesses opening or expanding operations in Tucson.” There are additional services these businesses need before they open. Some examples might be a commercial realtor, business attorney, CPA, business consultant, graphic designer, website builder, business telecommunications, fleet vehicle sales and a host of others.

 
After identifying the right strategic partners, we identified a marketing plan specifically designed to reach them. The plan included direct mail, social media, and which professional referral groups they should join to build relationships with these strategic partners.

Once you have identified your strategic partners and developed a marketing strategy to reach them, the challenge becomes how do you get them to send business to you? Most strategic partners already have a good relationship with one of your competitors, and may have been using them for years.

Local Search Citations

Aug 122017

Why Citations Are Important to the Success of Your Local Business

Citations are defined as mentions of your business name and address on other webpages—even if there is no link to your website. An example of a citation might be an online yellow pages directory where your business is listed, but not linked to. Citations can also be found on local chamber of commerce pages, or on a local business association page that includes your business information, even if they are not linking at all to your website.

Citations are a key component of the ranking algorithms in Google and Bing. Other factors being equal, businesses with a greater number of citations will probably rank higher than businesses with fewer citations.

Citations from well-established and well-indexed portals (i.e., Superpages.com) help increase the degree of certainty the search engines have about your business's contact information and categorization. To paraphrase former Arizona Cardinals' coach Dennis Green, citations help search engines confirm that businesses "are who we thought they were!"

Citations are particularly important in less-competitive niches, like plumbing or electrical, where many service providers don't have websites themselves. Without much other information, the search engines rely heavily on whatever information they can find.

Citations also validate that a business is part of a community. It's hard for someone to fake membership in a chamber of commerce or a city or county business index, or being written about in a local online newspaper or popular blog.

Citations and links from these kinds of websites can dramatically improve your local search engine rankings.

CalChamber 2017 HR California Online Membership

Aug 122017

A CalChamber Online membership gives companies doing business in California helpful HR compliance resources that save time and money.

I love the ease of the website. HRCalifornia has everything.

Kim Urban HR Manager, STEICO Industries, Inc.

Great membership benefits and perks!

Click here for Online Member Benefits

  • Unlimited access to the expert compliance resources of HRCalifornia (Smartphone & Tablet Ready):
  • In-depth HR Library with current information on California and federal employment laws and regulations
  • Clear explanations of compliance rules
  • Fast compliance wizards to simplify HR tasks    
  • Extensive Q&As section with quick answers to specific employment topics
  • Nearly 400 downloadable HR forms, checklists and policies
  • Interactive quizzes
  • Posting requirements    
  • Best practices for everyday HR processes
  • HRWatchdog blog with the latest employment-related news    
  • HR Certification Institute (HRCI) California Certification Study Tool    
  • eNewsletters reporting new laws and regulations, court cases and other news that affect your business
  • Business partner discounts on FedEx, UPS, OfficeMax and more
  • Up to 4 HR Certification Institute (HRCI) recertification credits per year for all registered users of HRCalifornia.com, plus $25 off any initial certification exam from HRCI