Semantic URL
Semantic URLs, also sometimes referred to as clean URLs, RESTful URLs, user-friendly URLs, or search engine-friendly URLs, are Uniform Resource Locators (URLs) intended to improve the usability and accessibility of a website or web service by being immediately and intuitively meaningful to non-expert users. Such URL schemes tend to reflect the conceptual structure of a collection of information and decouple the user interface from a server's internal representation of information. Other reasons for using clean URLs include search engine optimization (SEO),[1] conforming to the representational state transfer (REST) style of software architecture, and ensuring that individual web resources remain consistently at the same URL. This makes the World Wide Web a more stable and useful system, and allows more durable and reliable bookmarking of web resources.[2]
Semantic URLs also do not contain implementation details of the underlying web application. This carries the benefit of reducing the difficulty of changing the implementation of the resource at a later date. For example, many non-semantic URLs include the filename of a server-side script, such as example.php, example.asp or cgi-bin. If the underlying implementation of a resource is changed, such URLs would need to change along with it. Likewise, when URLs are non-semantic, if the site database is moved or restructured it has the potential to cause broken links, both internally and from external sites, the latter of which can lead to removal from search engine listings. The use of semantic URLs presents a consistent location for resources to user-agents regardless of internal structure. A further potential benefit to the use of semantic URLs is that the concealment of internal server or application information can improve the security of a system.
Structure
A non-semantic URL is typically composed of a path, script name, and query string. The query string parameters dictate the content to show on the page, and frequently include information opaque or irrelevant to users—such as internal numeric identifiers for values in a database, illegibly-encoded data, session IDs, implementation details, and so on. Semantic URLs, by contrast, contain only the path of a resource, in a hierarchy that reflects some logical structure that users can easily interpret and manipulate.
Non-semantic URL | Semantic URL |
---|---|
http://example.com/index.php?page=name |
http://example.com/name |
http://example.com/index.php?page=consulting/marketing |
http://example.com/consulting/marketing |
http://example.com/products?category=2&pid=25 |
http://example.com/products/2/25 |
http://example.com/cgi-bin/feed.cgi?feed=news&frm=rss |
http://example.com/news.rss |
http://example.com/services/index.jsp?category=legal&id=patents |
http://example.com/services/legal/patents |
http://example.com/kb/index.php?cat=8&id=41 |
http://example.com/kb/8/41 |
http://example.com/index.php?mod=profiles&id=193 |
http://example.com/profiles/193 |
http://en.wikipedia.org/w/index.php?title=Semantic_URL |
http://en.wikipedia.org/wiki/Semantic_URL |
Implementation
The implementation of semantic URLs involves URL mapping via pattern matching or transparent rewriting techniques. As this usually takes place on the server side, the semantic URL is often the only form seen by the user.
For search engine optimization purposes, web developers often take this opportunity to include relevant keywords in the URL and remove irrelevant words. Common words that are removed include articles and conjunctions, while descriptive keywords are added to increase user-friendliness and improve search engine rankings.[1]
A fragment identifier can be included at the end of a semantic URL for references within a page, and need not be user-readable.[3]
Slug
Some systems define a slug as the part of a URL that identifies a page in human-readable keywords.[4][5] It is usually the end part of the URL, which can be interpreted as the name of the resource, similar to the basename in a filename or the title of a page. The name is based on the use of the word slug in the news media to indicate a short name given to an article for internal use.
Slugs are typically generated automatically from a page title but can also be entered or altered manually, so that while the page title remains designed for display and human readability, its slug may be optimized for brevity or for consumption by search engines. Long page titles may also be truncated to keep the final URL to a reasonable length.
Slugs are generally entirely lowercase, with accented characters replaced by letters from the English alphabet and whitespace characters replaced by a dash or an underscore to avoid being encoded. Punctuation marks are generally removed, and some also remove short, common words such as conjunctions. For example:
- Original title: This, That and the Other! An Outré Collection
- Generated slug:
this-that-other-outre-collection
See also
- Information architecture
- Permalink
- Persistent uniform resource locator (PURL)
- URL normalization
- URL redirection
- URL shortening
- HTTP referer § Referer hiding
References
- 1 2 Opitz, Pascal (28 February 2006). "Clean URLs for better search engine ranking". Content with Style. Retrieved 9 September 2010.
- ↑ Berners-Lee, Tim (1998). "Cool URIs don't change". Style Guide for online hypertext. W3C. Retrieved 6 March 2011.
- ↑ "Uniform Resource Identifier (URI): Generic Syntax". RFC 3986. Internet Engineering Task Force. Retrieved 2 May 2014.
- ↑ Slug in the WordPress glossary
- ↑ Slug in the Django glossary