Internationalized Resource Identifier

The Internationalized Resource Identifier (IRI) was defined by the Internet Engineering Task Force (IETF) in 2005 as a new internet standard to extend upon the existing Uniform Resource Identifier (URI) scheme.[1] The new standard was published in RFC 3987.

While URIs are limited to a subset of the ASCII character set, IRIs may contain characters from the Universal Character Set (Unicode/ISO 10646), including Chinese or Japanese kanji, Korean, Cyrillic characters, and so forth.

Syntax

IRI extend upon URIs by using the Universal Character Set whereas URIs were limited to the ASCII with far fewer characters. IRIs may be represented by a sequence of octets but by definition is defined as a sequence of characters because IRIs can be spoken or written by hand.[2]

Compatibility

IRIs are mapped to URIs to retain backwards-compatibility with systems that do not support the new format.[2]

For applications and protocols that do not allow direct consumption of IRIs, the IRI should first be converted to Unicode using canonical composition normalization (NFC), if not already in Unicode format.

All non-ASCII code points in the IRI should next be encoded as UTF-8, and the resulting bytes percent-encoded, to produce a valid URI.

Example: The IRI http://www.defaultlogic.com/dictionary?s=????? becomes the URI http://www.defaultlogic.com/dictionary?s=%E1%BF%AC%CF%8C%CE%B4%CE%BF%CF%82

ASCII code points that are invalid URI characters may be encoded the same way, depending on implementation.[2]

This conversion is easily reversible; by definition, converting an IRI to an URI and back again will yield an IRI that is semantically equivalent to the original IRI, even though it may differ in exact representation.[3]

Some protocols may impose further transformations; e.g. Punycode for DNS labels.

Advantages

There are reasons to see URIs displayed in different languages; mostly, it makes it easier for users who are unfamiliar with the Latin (A-Z) alphabet. Assuming that it isn't too difficult for anyone to replicate arbitrary Unicode on their keyboards, this can make the URI system more accessible.[4]

Disadvantages

Mixing IRIs and ASCII URIs can make it much easier to do phishing attacks that trick someone into believing they are on a site they really are not on. For example, one can replace the "a" in www.ebay.com or www.paypal.com with an internationalized look-alike "a" character such as <?>, and point that IRI to a malicious site. This is known as an IDN homograph attack.

While a URI does not provide people with a way to specify Web resources using their own alphabets, an IRI does not make clear how Web resources can be accessed with keyboards that are not capable of generating the requisite internationalized characters. This does mean that IRIs are now handled in a way very similar to many other software which might require the use of variant Input method when dealing with texts in various languages.

See also

References

  1. ^ Gangemi, Aldo; Presutti, Valentina (2006). "The bourne identity of a web resource" (PDF). Proceedings of Identity Reference and the Web Workshop (IRW). Laboratory for Applied Ontology. Roma, Italy: National Research Council (ISTC-CNR): 3. Notice that IRIs (Internationalized Resource Identifier) [11] are supposed to replace URIs in next future. 
  2. ^ a b c Duerst, M. (2005). "RFC 3987". Network Working Group. Internet Engineering Task Force. Standards Track. Retrieved 2014. 
  3. ^ Hendler, Hrsg. Dieter Fensel; Hrsg. John Domingue; Hrsg. James A. (2010). Handbook of Semantic Web Technologies (1. Aufl. ed.). Berlin: Springer-Verlag GmbH. ISBN 978-3-540-92912-3. Retrieved 2014. 
  4. ^ Clark, Kendall (2003-05-07). "Internationalizing the URI". O'Reilly Media, Inc. Retrieved 2014. 

External links


  This article uses material from the Wikipedia page available here. It is released under the Creative Commons Attribution-Share-Alike License 3.0.


Internationalized_Resource_Identifier



 
Connect with defaultLogic
What We've Done
Led Digital Marketing Efforts of Top 500 e-Retailers.
Worked with Top Brands at Leading Agencies.
Successfully Managed Over $50 million in Digital Ad Spend.
Developed Strategies and Processes that Enabled Brands to Grow During an Economic Downturn.
Taught Advanced Internet Marketing Strategies at the graduate level.



Manage research, learning and skills at defaultLogic. Create an account using LinkedIn or facebook to manage and organize your IT knowledge. defaultLogic works like a shopping cart for information -- helping you to save, discuss and share.


  Contact Us