fbpx
Wikipedia

IDN homograph attack

The internationalized domain name (IDN) homograph attack is a way a malicious party may deceive computer users about what remote system they are communicating with, by exploiting the fact that many different characters look alike (i.e., they are homographs, hence the term for the attack, although technically homoglyph is the more accurate term for different characters that look alike). For example, the Cyrillic, Greek and Latin alphabets each have a letter ⟨o⟩ that has the same shape but different meaning from its counterparts.[a]

An example of an IDN homograph attack; the Latin letters "e" and "a" are replaced with the Cyrillic letters "е" and "а".

This kind of spoofing attack is also known as script spoofing. Unicode incorporates numerous writing systems, and, for a number of reasons, similar-looking characters such as Greek Ο, Latin O, and Cyrillic О were not assigned the same code. Their incorrect or malicious usage is a possibility for security attacks. Thus, for example, a regular user of example.com may be lured to click on it unquestioningly as an apparently familiar link, unaware that the third letter is not the Latin character "a" but rather the Cyrillic character "а" and is thus an entirely different domain from the intended one.

The registration of homographic domain names is akin to typosquatting, in that both forms of attacks use a similar-looking name to a more established domain to fool a user.[b] The major difference is that in typosquatting the perpetrator attracts victims by relying on natural typographical errors commonly made when manually entering a URL, while in homograph spoofing the perpetrator deceives the victims by presenting visually indistinguishable hyperlinks. Indeed, it would be a rare accident for a web user to type, for example, a Cyrillic letter within an otherwise English word such as "citibаnk". There are cases in which a registration can be both typosquatting and homograph spoofing; the pairs of l/I, i/j, and 0/O are all both close together on keyboards and, depending on the typeface, may be difficult or impossible to distinguish.

History edit

 
Homoglyphs are common in the three major European alphabets: Latin, Greek and Cyrillic. Unicode does not attempt to unify the glyphs and instead separates each script.

An early nuisance of this kind, pre-dating the Internet and even text terminals, was the confusion between "l" (lowercase letter "L") / "1" (the number "one") and "O" (capital letter for vowel "o") / "0" (the number "zero"). Some typewriters in the pre-computer era even combined the L and the one; users had to type a lowercase L when the number one was needed. The zero/o confusion gave rise to the tradition of crossing zeros, so that a computer operator would type them correctly.[1] Unicode may contribute to this greatly with its combining characters, accents, several types of hyphen, etc., often due to inadequate rendering support, especially with smaller font sizes and the wide variety of fonts.[2]

Even earlier, handwriting provided rich opportunities for confusion. A notable example is the etymology of the word "zenith". The translation from the Arabic "samt" included the scribe's confusing of "m" into "ni". This was common in medieval blackletter, which did not connect the vertical columns on the letters i, m, n, or u, making them difficult to distinguish when several were in a row. The latter, as well as "rn"/"m"/"rri" ("RN"/"M"/"RRI") confusion, is still possible for a human eye even with modern advanced computer technology.

Intentional look-alike character substitution with different alphabets has also been known in various contexts. For example, Faux Cyrillic has been used as an amusement or attention-grabber and "Volapuk encoding", in which Cyrillic script is represented by similar Latin characters, was used in early days of the Internet as a way to overcome the lack of support for the Cyrillic alphabet. Another example is that vehicle registration plates can have both Cyrillic (for domestic usage in Cyrillic script countries) and Latin (for international driving) with the same letters. Registration plates that are issued in Greece are limited to using letters of the Greek alphabet that have homoglyphs in the Latin alphabet, as European Union regulations require the use of Latin letters.

Homographs in ASCII edit

ASCII has several characters or pairs of characters that look alike and are known as homographs (or homoglyphs). Spoofing attacks based on these similarities are known as homograph spoofing attacks. For example, 0 (the number) and O (the letter), "l" lowercase L, and "I" uppercase "i".

In a typical example of a hypothetical attack, someone could register a domain name that appears almost identical to an existing domain but goes somewhere else. For example, the domain "rnicrosoft.com" begins with "r" and "n", not "m". Other examples are G00GLE.COM which looks much like GOOGLE.COM in some fonts. Using a mix of uppercase and lowercase characters, googIe.com (capital i, not small L) looks much like google.com in some fonts. PayPal was a target of a phishing scam exploiting this, using the domain PayPaI.com. In certain narrow-spaced fonts such as Tahoma (the default in the address bar in Windows XP), placing a c in front of a j, l or i will produce homoglyphs such as cl cj ci (d g a).

Homographs in internationalized domain names edit

In multilingual computer systems, different logical characters may have identical appearances. For example, Unicode character U+0430, Cyrillic small letter a ("а"), can look identical to Unicode character U+0061, Latin small letter a, ("a") which is the lowercase "a" used in English. Hence wikipediа.org (xn--wikipedi-86g.org; the Cyrillic version) instead of wikipedia.org (the Latin version).

The problem arises from the different treatment of the characters in the user's mind and the computer's programming. From the viewpoint of the user, a Cyrillic "а" within a Latin string is a Latin "a"; there is no difference in the glyphs for these characters in most fonts. However, the computer treats them differently when processing the character string as an identifier. Thus, the user's assumption of a one-to-one correspondence between the visual appearance of a name and the named entity breaks down.

Internationalized domain names provide a backward-compatible way for domain names to use the full Unicode character set, and this standard is already widely supported. However this system expanded the character repertoire from a few dozen characters in a single alphabet to many thousands of characters in many scripts; this greatly increased the scope for homograph attacks.

This opens a rich vein of opportunities for phishing and other varieties of fraud. An attacker could register a domain name that looks just like that of a legitimate website, but in which some of the letters have been replaced by homographs in another alphabet. The attacker could then send e-mail messages purporting to come from the original site, but directing people to the bogus site. The spoof site could then record information such as passwords or account details, while passing traffic through to the real site. The victims may never notice the difference, until suspicious or criminal activity occurs with their accounts.

In December 2001 Evgeniy Gabrilovich and Alex Gontmakher, both from Technion, Israel, published a paper titled "The Homograph Attack",[1] which described an attack that used Unicode URLs to spoof a website URL. To prove the feasibility of this kind of attack, the researchers successfully registered a variant of the domain name microsoft.com which incorporated Cyrillic characters.

Problems of this kind were anticipated before IDN was introduced, and guidelines were issued to registries to try to avoid or reduce the problem. For example, it was advised that registries only accept characters from the Latin alphabet and that of their own country, not all of Unicode characters, but this advice was neglected by major TLDs.[citation needed]

On February 7, 2005, Slashdot reported that this exploit was disclosed by 3ric Johanson at the hacker conference Shmoocon.[3] Web browsers supporting IDNA appeared to direct the URL http://www.pаypal.com/, in which the first a character is replaced by a Cyrillic а, to the site of the well known payment site PayPal, but actually led to a spoofed web site with different content. Popular browsers continued to have problems properly displaying international domain names through April 2017.[4]

The following alphabets have characters that can be used for spoofing attacks (please note, these are only the most obvious and common, given artistic license and how much risk the spoofer will take of getting caught; the possibilities are far more numerous than can be listed here):

Cyrillic edit

Cyrillic is, by far, the most commonly used alphabet for homoglyphs, largely because it contains 11 lowercase glyphs that are identical or nearly identical to Latin counterparts.

The Cyrillic letters а, с, е, о, р, х and у have optical counterparts in the basic Latin alphabet and look close or identical to a, c, e, o, p, x and y. Cyrillic З, Ч and б resemble the numerals 3, 4 and 6. Italic type generates more homoglyphs: дтпи or дтпи (дтпи in standard type), resembling dmnu (in some fonts д can be used, since its italic form resembles a lowercase g; however, in most mainstream fonts, д instead resembles a partial differential sign, ).

If capital letters are counted, АВСЕНІЈКМОРЅТХ can substitute ABCEHIJKMOPSTX, in addition to the capitals for the lowercase Cyrillic homoglyphs.

Cyrillic non-Russian problematic letters are і and i, ј and j, ԛ and q, ѕ and s, ԝ and w, Ү and Y, while Ғ and F, Ԍ and G bear some resemblance to each other. Cyrillic ӓёїӧ can also be used if an IDN itself is being spoofed, to fake äëïö.

While Komi De (ԁ), shha (һ), palochka (Ӏ) and izhitsa (ѵ) bear strong resemblance to Latin d, h, l and v, these letters are either rare or archaic and are not widely supported in most standard fonts (they are not included in the WGL-4). Attempting to use them could cause a ransom note effect.

Greek edit

From the Greek alphabet, only omicron ο and sometimes nu ν appear identical to a Latin alphabet letter in the lowercase used for URLs. Fonts that are in italic type will feature Greek alpha α looking like a Latin a.

This list increases if close matches are also allowed (such as Greek εικηρτυωχγ for eiknptuwxy). Using capital letters, the list expands greatly. Greek ΑΒΕΗΙΚΜΝΟΡΤΧΥΖ looks identical to Latin ABEHIKMNOPTXYZ. Greek ΑΓΒΕΗΚΜΟΠΡΤΦΧ looks similar to Cyrillic АГВЕНКМОПРТФХ (as do Cyrillic Л (Л) and Greek Λ in certain geometric sans-serif fonts), Greek letters κ and ο look similar to Cyrillic к and о. Besides this Greek τ, φ can be similar to Cyrillic т, ф in some fonts, Greek δ resembles Cyrillic б in the Serbian alphabet, and the Cyrillic а also italicizes the same as its Latin counterpart, making it possible to substitute it for alpha or vice versa. The lunate form of sigma, Ϲϲ, resembles both Latin Cc and Cyrillic Сс.

If an IDN itself is being spoofed, Greek beta β can be a substitute for German eszett ß in some fonts (and in fact, code page 437 treats them as equivalent), as can Greek end-of-word-variant sigma ς for ç; accented Greek substitutes όίά can usually be used for óíá in many fonts, with the last of these (alpha) again only resembling a in italic type.

Armenian edit

The Armenian alphabet can also contribute critical characters: several Armenian characters like օ, ո, ս, as well capital Տ and Լ are often completely identical to Latin characters in modern fonts, and symbols which similar enough to pass off, such as ցհոօզս which look like ghnoqu, յ which resembles j (albeit dotless), and ք, which can either resemble p or f depending on the font; ա can resemble Cyrillic ш. However, the use of Armenian is, luckily, a bit less reliable: Not all standard fonts feature Armenian glyphs (whereas the Greek and Cyrillic scripts are); Windows prior to Windows 7 rendered Armenian in a distinct font, Sylfaen, of which the mixing of Armenian with Latin would appear obviously different if using a font other than Sylfaen or a Unicode typeface. (This is known as a ransom note effect.) The current version of Tahoma, used in Windows 7, supports Armenian (previous versions did not). Furthermore, this font differentiates Latin g from Armenian ց.

Two letters in Armenian (Ձշ) also can resemble the number 2, Յ resembles 3, while another (վ) sometimes resembles the number 4.

Hebrew edit

Hebrew spoofing is generally rare. Only three letters from that alphabet can reliably be used: samekh (ס), which sometimes resembles o, vav with diacritic (וֹ), which resembles an i, and heth (ח), which resembles the letter n. Less accurate approximants for some other alphanumerics can also be found, but these are usually only accurate enough to use for the purposes of foreign branding and not for substitution. Furthermore, the Hebrew alphabet is written from right to left and trying to mix it with left-to-right glyphs may cause problems.

Thai edit

 
Top: Thai glyphs rendered in a modern font (IBM Plex) in which they resemble Latin glyphs.
Bottom: The same glyphs rendered with traditional loops.

Though the Thai script has historically had a distinct look with numerous loops and small flourishes, modern Thai typography, beginning with Manoptica in 1973 and continuing through IBM Plex in the modern era, has increasingly adopted a simplified style in which Thai characters are represented with glyphs strongly resembling Latin letters. ค (A), ท (n), น (u), บ (U), ป (J), พ (W), ร (S), and ล (a) are among the Thai glyphs that can closely resemble Latin.

Chinese edit

The Chinese language can be problematic for homographs as many characters exist as both traditional (regular script) and simplified Chinese characters. In the .org domain, registering one variant renders the other unavailable to anyone; in .biz a single Chinese-language IDN registration delivers both variants as active domains (which must have the same domain name server and the same registrant). .hk (.香港) also adopts this policy.

Other scripts edit

Other Unicode scripts in which homographs can be found include Number Forms (Roman numerals), CJK Compatibility and Enclosed CJK Letters and Months (certain abbreviations), Latin (certain digraphs), Currency Symbols, Mathematical Alphanumeric Symbols, and Alphabetic Presentation Forms (typographic ligatures).

Accented characters edit

Two names which differ only in an accent on one character may look very similar, particularly when the substitution involves the dotted letter i; the tittle (dot) on the i can be replaced with a diacritic (such as a grave accent or acute accent; both ì and í are included in most standard character sets and fonts) that can only be detected with close inspection. In most top-level domain registries, wíkipedia.tld (xn--wkipedia-c2a.tld) and wikipedia.tld are two different names which may be held by different registrants.[5] One exception is .ca, where reserving the plain-ASCII version of the domain prevents another registrant from claiming an accented version of the same name.[6]

Non-displayable characters edit

Unicode includes many characters which are not displayed by default, such as the zero-width space. In general, ICANN prohibits any domain with these characters from being registered, regardless of TLD.

Known homograph attacks edit

In 2011, an unknown source (registering under the name "Completely Anonymous") registered a domain name homographic to television station KBOI-TV's to create a fake news website. The sole purpose of the site was to spread an April Fool's Day joke regarding the Governor of Idaho issuing a supposed ban on the sale of music by Justin Bieber.[7][8]

In September 2017, security researcher Ankit Anubhav discovered an IDN homograph attack where the attackers registered adoḅe.com to deliver the Betabot trojan.[9]

Defending against the attack edit

Client-side mitigation edit

The simplest defense is for web browsers not to support IDNA or other similar mechanisms, or for users to turn off whatever support their browsers have. That could mean blocking access to IDNA sites, but generally browsers permit access and just display IDNs in Punycode. Either way, this amounts to abandoning non-ASCII domain names.

  • Mozilla Firefox versions 22 and later display IDNs if either the TLD prevents homograph attacks by restricting which characters can be used in domain names or labels do not mix scripts for different languages. Otherwise IDNs are displayed in Punycode.[10][11]
  • Google Chrome versions 51 and later use an algorithm similar to the one used by Firefox. Previous versions display an IDN only if all of its characters belong to one (and only one) of the user's preferred languages. Chromium and Chromium-based browsers such as Microsoft Edge (since 2019) and Opera also use the same algorithm. [12][13]
  • Safari's approach is to render problematic character sets as Punycode. This can be changed by altering the settings in Mac OS X's system files.[14]
  • Internet Explorer versions 7 and later allow IDNs except for labels that mix scripts for different languages. Labels that mix scripts are displayed in Punycode. There are exceptions to locales where ASCII characters are commonly mixed with localized scripts.[15] Internet Explorer 7 was capable of using IDNs, but it imposes restrictions on displaying non-ASCII domain names based on a user-defined list of allowed languages and provides an anti-phishing filter that checks suspicious Web sites against a remote database of known phishing sites.[citation needed]
  • Old Microsoft Edge converts all Unicode into Punycode.[citation needed]

As an additional defense, Internet Explorer 7, Firefox 2.0 and above, and Opera 9.10 include phishing filters that attempt to alert users when they visit malicious websites.[16][17][18] As of April 2017, several browsers (including Chrome, Firefox and Opera) were displaying IDNs consisting purely of Cyrillic characters normally (not as punycode), allowing spoofing attacks. Chrome tightened IDN restrictions in version 59 to prevent this attack.[19][20]

Browser extensions like No Homo-Graphs are available for Google Chrome and Firefox that check whether the user is visiting a website which is a homograph of another domain from a user-defined list.[21]

These methods of defense only extend to within a browser. Homographic URLs that house malicious software can still be distributed, without being displayed as Punycode, through e-mail, social networking or other Web sites without being detected until the user actually clicks the link. While the fake link will show in Punycode when it is clicked, by this point the page has already begun loading into the browser.[citation needed]

Server-side/registry operator mitigation edit

The IDN homographs database is a Python library that allows developers to defend against this using machine learning-based character recognition.[22]

ICANN has implemented a policy prohibiting any potential internationalized TLD from choosing letters that could resemble an existing Latin TLD and thus be used for homograph attacks. Proposed IDN TLDs .бг (Bulgaria), .укр (Ukraine) and .ελ (Greece) have been rejected or stalled because of their perceived resemblance to Latin letters. All three (and Serbian .срб and Mongolian .мон) have later been accepted.[23] Three-letter TLD are considered safer than two-letter TLD, since they are harder to match to normal Latin ISO-3166 country domains; although the potential to match new generic domains remains, such generic domains are far more expensive than registering a second- or third-level domain address, making it cost-prohibitive to try to register a homoglyphic TLD for the sole purpose of making fraudulent domains (which itself would draw ICANN scrutiny).

The Russian registry operator Coordination Center for TLD RU only accepts Cyrillic names for the top-level domain .рф, forbidding a mix with Latin or Greek characters. However the problem in .com and other gTLDs remains open.[24]

Research based mitigations edit

In their 2019 study, Suzuki et al. introduced ShamFinder,[25] a program for recognizing IDNs, shedding light on their prevalence in real-world scenarios. Similarly, Chiba et al. (2019) designed DomainScouter,[26] a system adept at detecting diverse homograph IDNs in domains through analyzing an estimated 4.4 million registered IDNs across 570 Top-Level Domains (TLDs) it was able to successfully identify 8,284 IDN homographs, including many previously unidentified cases targeting brands in languages other than English.[27]

See also edit

Notes edit

  1. ^ U+043E о CYRILLIC SMALL LETTER O, U+03BF ο GREEK SMALL LETTER OMICRON, U+006F o LATIN SMALL LETTER O
  2. ^ For example, Microsfot.com

References edit

  1. ^ a b Evgeniy Gabrilovich and Alex Gontmakher, (PDF). Archived from the original (PDF) on 2020-01-02. Retrieved 2005-12-10.{{cite web}}: CS1 maint: archived copy as title (link), Communications of the ACM, 45(2):128, February 2002
  2. ^ "Unicode Security Considerations", Technical Report #36, 2010-04-28
  3. ^ IDN hacking disclosure by shmoo.com 2005-03-20 at the Wayback Machine
  4. ^ "Chrome and Firefox Phishing Attack Uses Domains Identical to Known Safe Sites". Wordfence. 2017-04-14. Retrieved 2017-04-18.
  5. ^ There are various Punycode converters online, such as https://www.hkdnr.hk/idn_conv.jsp
  6. ^ . Archived from the original on 2015-09-07. Retrieved 2015-09-22.
  7. ^ Fake website URL not from KBOI-TV 2011-04-05 at the Wayback Machine. KBOI-TV. Retrieved 2011-04-01.
  8. ^ Boise TV news website targeted with Justin Bieber prank 2012-03-15 at the Wayback Machine. KTVB. Retrieved 2011-04-01.
  9. ^ Mimoso, Michael (2017-09-06). "IDN Homograph Attack Spreading Betabot Backdoor". Threatpost. from the original on 2023-10-17. Retrieved 2020-09-20.
  10. ^ "IDN Display Algorithm". Mozilla. Retrieved 2016-01-31.
  11. ^ "Bug 722299". Bugzilla.mozilla.org. Retrieved 2016-01-31.
  12. ^ "Internationalized Domain Names (IDN) in Google Chrome". chromium.googlesource.com. Retrieved 2020-08-26.
  13. ^ "Upcoming update with IDN homograph phishing fix - Blog". Opera Security. 2017-04-21. Retrieved 2020-08-26.
  14. ^ "About Safari International Domain Name support". Retrieved 2017-04-29.
  15. ^ Sharif, Tariq (2006-07-31). "Changes to IDN in IE7 to now allow mixing of scripts". IEBlog. Microsoft. Retrieved 2006-11-30.
  16. ^ Sharif, Tariq (2005-09-09). "Phishing Filter in IE7". IEBlog. Microsoft. Retrieved 2006-11-30.
  17. ^ "Firefox 2 Phishing Protection". Mozilla. 2006. Retrieved 2006-11-30.
  18. ^ "Opera Fraud Protection". Opera Software. 2006-12-18. Retrieved 2007-02-24.
  19. ^ Chrome and Firefox Phishing Attack Uses Domains Identical to Known Safe Sites
  20. ^ Phishing with Unicode Domains
  21. ^ "No Homo-Graphs". em_te. 2018-06-28. Retrieved 2020-02-18.
  22. ^ "IDN Homographs Database". GitHub. 25 September 2021.
  23. ^ IDN ccTLD Fast Track String Evaluation Completion 2014-10-17 at the Wayback Machine
  24. ^ Emoji to Zero-Day: Latin Homoglyphs in Domains and Subdomains 2020-12-09 at the Wayback Machine
  25. ^ Suzuki, Hiroaki; Chiba, Daiki; Yoneya, Yoshiro; Mori, Tatsuya; Goto, Shigeki (2019-10-21). "ShamFinder". Proceedings of the Internet Measurement Conference. New York, NY, USA: ACM. doi:10.1145/3355369.3355587.
  26. ^ CHIBA, Daiki; AKIYAMA HASEGAWA, Ayako; KOIDE, Takashi; SAWABE, Yuta; GOTO, Shigeki; AKIYAMA, Mitsuaki (2020-07-01). "DomainScouter: Analyzing the Risks of Deceptive Internationalized Domain Names". IEICE Transactions on Information and Systems. E103.D (7): 1493–1511. doi:10.1587/transinf.2019icp0002. ISSN 0916-8532.
  27. ^ Safaei Pour, Morteza; Nader, Christelle; Friday, Kurt; Bou-Harb, Elias (May 2023). "A Comprehensive Survey of Recent Internet Measurement Techniques for Cyber Security". Computers & Security. 128: 103123. doi:10.1016/j.cose.2023.103123. ISSN 0167-4048.

homograph, attack, internationalized, domain, name, homograph, attack, malicious, party, deceive, computer, users, about, what, remote, system, they, communicating, with, exploiting, fact, that, many, different, characters, look, alike, they, homographs, hence. The internationalized domain name IDN homograph attack is a way a malicious party may deceive computer users about what remote system they are communicating with by exploiting the fact that many different characters look alike i e they are homographs hence the term for the attack although technically homoglyph is the more accurate term for different characters that look alike For example the Cyrillic Greek and Latin alphabets each have a letter o that has the same shape but different meaning from its counterparts a An example of an IDN homograph attack the Latin letters e and a are replaced with the Cyrillic letters e and a This kind of spoofing attack is also known as script spoofing Unicode incorporates numerous writing systems and for a number of reasons similar looking characters such as Greek O Latin O and Cyrillic O were not assigned the same code Their incorrect or malicious usage is a possibility for security attacks Thus for example a regular user of example com may be lured to click on it unquestioningly as an apparently familiar link unaware that the third letter is not the Latin character a but rather the Cyrillic character a and is thus an entirely different domain from the intended one The registration of homographic domain names is akin to typosquatting in that both forms of attacks use a similar looking name to a more established domain to fool a user b The major difference is that in typosquatting the perpetrator attracts victims by relying on natural typographical errors commonly made when manually entering a URL while in homograph spoofing the perpetrator deceives the victims by presenting visually indistinguishable hyperlinks Indeed it would be a rare accident for a web user to type for example a Cyrillic letter within an otherwise English word such as citibank There are cases in which a registration can be both typosquatting and homograph spoofing the pairs of l I i j and 0 O are all both close together on keyboards and depending on the typeface may be difficult or impossible to distinguish Contents 1 History 2 Homographs in ASCII 3 Homographs in internationalized domain names 3 1 Cyrillic 3 2 Greek 3 3 Armenian 3 4 Hebrew 3 5 Thai 3 6 Chinese 3 7 Other scripts 3 8 Accented characters 3 9 Non displayable characters 3 10 Known homograph attacks 4 Defending against the attack 4 1 Client side mitigation 4 2 Server side registry operator mitigation 4 3 Research based mitigations 5 See also 6 Notes 7 ReferencesHistory edit nbsp Homoglyphs are common in the three major European alphabets Latin Greek and Cyrillic Unicode does not attempt to unify the glyphs and instead separates each script An early nuisance of this kind pre dating the Internet and even text terminals was the confusion between l lowercase letter L 1 the number one and O capital letter for vowel o 0 the number zero Some typewriters in the pre computer era even combined the L and the one users had to type a lowercase L when the number one was needed The zero o confusion gave rise to the tradition of crossing zeros so that a computer operator would type them correctly 1 Unicode may contribute to this greatly with its combining characters accents several types of hyphen etc often due to inadequate rendering support especially with smaller font sizes and the wide variety of fonts 2 Even earlier handwriting provided rich opportunities for confusion A notable example is the etymology of the word zenith The translation from the Arabic samt included the scribe s confusing of m into ni This was common in medieval blackletter which did not connect the vertical columns on the letters i m n or u making them difficult to distinguish when several were in a row The latter as well as rn m rri RN M RRI confusion is still possible for a human eye even with modern advanced computer technology Intentional look alike character substitution with different alphabets has also been known in various contexts For example Faux Cyrillic has been used as an amusement or attention grabber and Volapuk encoding in which Cyrillic script is represented by similar Latin characters was used in early days of the Internet as a way to overcome the lack of support for the Cyrillic alphabet Another example is that vehicle registration plates can have both Cyrillic for domestic usage in Cyrillic script countries and Latin for international driving with the same letters Registration plates that are issued in Greece are limited to using letters of the Greek alphabet that have homoglyphs in the Latin alphabet as European Union regulations require the use of Latin letters Homographs in ASCII editASCII has several characters or pairs of characters that look alike and are known as homographs or homoglyphs Spoofing attacks based on these similarities are known as homograph spoofing attacks For example 0 the number and O the letter l lowercase L and I uppercase i In a typical example of a hypothetical attack someone could register a domain name that appears almost identical to an existing domain but goes somewhere else For example the domain rnicrosoft com begins with r and n not m Other examples are G00GLE COM which looks much like GOOGLE COM in some fonts Using a mix of uppercase and lowercase characters googIe com capital i not small L looks much like google com in some fonts PayPal was a target of a phishing scam exploiting this using the domain PayPaI com In certain narrow spaced fonts such as Tahoma the default in the address bar in Windows XP placing a c in front of a j l or i will produce homoglyphs such as cl cj ci d g a Homographs in internationalized domain names editIn multilingual computer systems different logical characters may have identical appearances For example Unicode character U 0430 Cyrillic small letter a a can look identical to Unicode character U 0061 Latin small letter a a which is the lowercase a used in English Hence wikipedia org xn wikipedi 86g org the Cyrillic version instead of wikipedia org the Latin version The problem arises from the different treatment of the characters in the user s mind and the computer s programming From the viewpoint of the user a Cyrillic a within a Latin string is a Latin a there is no difference in the glyphs for these characters in most fonts However the computer treats them differently when processing the character string as an identifier Thus the user s assumption of a one to one correspondence between the visual appearance of a name and the named entity breaks down Internationalized domain names provide a backward compatible way for domain names to use the full Unicode character set and this standard is already widely supported However this system expanded the character repertoire from a few dozen characters in a single alphabet to many thousands of characters in many scripts this greatly increased the scope for homograph attacks This opens a rich vein of opportunities for phishing and other varieties of fraud An attacker could register a domain name that looks just like that of a legitimate website but in which some of the letters have been replaced by homographs in another alphabet The attacker could then send e mail messages purporting to come from the original site but directing people to the bogus site The spoof site could then record information such as passwords or account details while passing traffic through to the real site The victims may never notice the difference until suspicious or criminal activity occurs with their accounts In December 2001 Evgeniy Gabrilovich and Alex Gontmakher both from Technion Israel published a paper titled The Homograph Attack 1 which described an attack that used Unicode URLs to spoof a website URL To prove the feasibility of this kind of attack the researchers successfully registered a variant of the domain name microsoft com which incorporated Cyrillic characters Problems of this kind were anticipated before IDN was introduced and guidelines were issued to registries to try to avoid or reduce the problem For example it was advised that registries only accept characters from the Latin alphabet and that of their own country not all of Unicode characters but this advice was neglected by major TLDs citation needed On February 7 2005 Slashdot reported that this exploit was disclosed by 3ric Johanson at the hacker conference Shmoocon 3 Web browsers supporting IDNA appeared to direct the URL http www paypal com in which the first a character is replaced by a Cyrillic a to the site of the well known payment site PayPal but actually led to a spoofed web site with different content Popular browsers continued to have problems properly displaying international domain names through April 2017 4 The following alphabets have characters that can be used for spoofing attacks please note these are only the most obvious and common given artistic license and how much risk the spoofer will take of getting caught the possibilities are far more numerous than can be listed here Cyrillic edit Cyrillic is by far the most commonly used alphabet for homoglyphs largely because it contains 11 lowercase glyphs that are identical or nearly identical to Latin counterparts The Cyrillic letters a s e o r h and u have optical counterparts in the basic Latin alphabet and look close or identical to a c e o p x and y Cyrillic Z Ch and b resemble the numerals 3 4 and 6 Italic type generates more homoglyphs dtpi or dtpi dtpi in standard type resembling dmnu in some fonts d can be used since its italic form resembles a lowercase g however in most mainstream fonts d instead resembles a partial differential sign If capital letters are counted AVSENIЈKMORЅTH can substitute ABCEHIJKMOPSTX in addition to the capitals for the lowercase Cyrillic homoglyphs Cyrillic non Russian problematic letters are i and i ј and j ԛ and q ѕ and s ԝ and w Ү and Y while Ғ and F Ԍ and G bear some resemblance to each other Cyrillic ӓyoyiӧ can also be used if an IDN itself is being spoofed to fake aeio While Komi De ԁ shha һ palochka Ӏ and izhitsa ѵ bear strong resemblance to Latin d h l and v these letters are either rare or archaic and are not widely supported in most standard fonts they are not included in the WGL 4 Attempting to use them could cause a ransom note effect Greek edit From the Greek alphabet only omicron o and sometimes nu n appear identical to a Latin alphabet letter in the lowercase used for URLs Fonts that are in italic type will feature Greek alpha a looking like a Latin a This list increases if close matches are also allowed such as Greek eikhrtywxg for eiknptuwxy Using capital letters the list expands greatly Greek ABEHIKMNORTXYZ looks identical to Latin ABEHIKMNOPTXYZ Greek AGBEHKMOPRTFX looks similar to Cyrillic AGVENKMOPRTFH as do Cyrillic L L and Greek L in certain geometric sans serif fonts Greek letters k and o look similar to Cyrillic k and o Besides this Greek t f can be similar to Cyrillic t f in some fonts Greek d resembles Cyrillic b in the Serbian alphabet and the Cyrillic a also italicizes the same as its Latin counterpart making it possible to substitute it for alpha or vice versa The lunate form of sigma Ϲϲ resembles both Latin Cc and Cyrillic Ss If an IDN itself is being spoofed Greek beta b can be a substitute for German eszett ss in some fonts and in fact code page 437 treats them as equivalent as can Greek end of word variant sigma s for c accented Greek substitutes oia can usually be used for oia in many fonts with the last of these alpha again only resembling a in italic type Armenian edit The Armenian alphabet can also contribute critical characters several Armenian characters like օ ո ս as well capital Տ and Լ are often completely identical to Latin characters in modern fonts and symbols which similar enough to pass off such as ցհոօզս which look like ghnoqu յ which resembles j albeit dotless and ք which can either resemble p or f depending on the font ա can resemble Cyrillic sh However the use of Armenian is luckily a bit less reliable Not all standard fonts feature Armenian glyphs whereas the Greek and Cyrillic scripts are Windows prior to Windows 7 rendered Armenian in a distinct font Sylfaen of which the mixing of Armenian with Latin would appear obviously different if using a font other than Sylfaen or a Unicode typeface This is known as a ransom note effect The current version of Tahoma used in Windows 7 supports Armenian previous versions did not Furthermore this font differentiates Latin g from Armenian ց Two letters in Armenian Ձշ also can resemble the number 2 Յ resembles 3 while another վ sometimes resembles the number 4 Hebrew edit Hebrew spoofing is generally rare Only three letters from that alphabet can reliably be used samekh ס which sometimes resembles o vav with diacritic ו which resembles an i and heth ח which resembles the letter n Less accurate approximants for some other alphanumerics can also be found but these are usually only accurate enough to use for the purposes of foreign branding and not for substitution Furthermore the Hebrew alphabet is written from right to left and trying to mix it with left to right glyphs may cause problems Thai edit nbsp Top Thai glyphs rendered in a modern font IBM Plex in which they resemble Latin glyphs Bottom The same glyphs rendered with traditional loops Though the Thai script has historically had a distinct look with numerous loops and small flourishes modern Thai typography beginning with Manoptica in 1973 and continuing through IBM Plex in the modern era has increasingly adopted a simplified style in which Thai characters are represented with glyphs strongly resembling Latin letters kh A th n n u b U p J ph W r S and l a are among the Thai glyphs that can closely resemble Latin Chinese edit The Chinese language can be problematic for homographs as many characters exist as both traditional regular script and simplified Chinese characters In the org domain registering one variant renders the other unavailable to anyone in biz a single Chinese language IDN registration delivers both variants as active domains which must have the same domain name server and the same registrant hk 香港 also adopts this policy Other scripts edit Other Unicode scripts in which homographs can be found include Number Forms Roman numerals CJK Compatibility and Enclosed CJK Letters and Months certain abbreviations Latin certain digraphs Currency Symbols Mathematical Alphanumeric Symbols and Alphabetic Presentation Forms typographic ligatures Accented characters edit Two names which differ only in an accent on one character may look very similar particularly when the substitution involves the dotted letter i the tittle dot on the i can be replaced with a diacritic such as a grave accent or acute accent both i and i are included in most standard character sets and fonts that can only be detected with close inspection In most top level domain registries wikipedia tld xn wkipedia c2a tld and wikipedia tld are two different names which may be held by different registrants 5 One exception is ca where reserving the plain ASCII version of the domain prevents another registrant from claiming an accented version of the same name 6 Non displayable characters edit Unicode includes many characters which are not displayed by default such as the zero width space In general ICANN prohibits any domain with these characters from being registered regardless of TLD Known homograph attacks edit In 2011 an unknown source registering under the name Completely Anonymous registered a domain name homographic to television station KBOI TV s to create a fake news website The sole purpose of the site was to spread an April Fool s Day joke regarding the Governor of Idaho issuing a supposed ban on the sale of music by Justin Bieber 7 8 In September 2017 security researcher Ankit Anubhav discovered an IDN homograph attack where the attackers registered adoḅe com to deliver the Betabot trojan 9 Defending against the attack editClient side mitigation edit The simplest defense is for web browsers not to support IDNA or other similar mechanisms or for users to turn off whatever support their browsers have That could mean blocking access to IDNA sites but generally browsers permit access and just display IDNs in Punycode Either way this amounts to abandoning non ASCII domain names Mozilla Firefox versions 22 and later display IDNs if either the TLD prevents homograph attacks by restricting which characters can be used in domain names or labels do not mix scripts for different languages Otherwise IDNs are displayed in Punycode 10 11 Google Chrome versions 51 and later use an algorithm similar to the one used by Firefox Previous versions display an IDN only if all of its characters belong to one and only one of the user s preferred languages Chromium and Chromium based browsers such as Microsoft Edge since 2019 and Opera also use the same algorithm 12 13 Safari s approach is to render problematic character sets as Punycode This can be changed by altering the settings in Mac OS X s system files 14 Internet Explorer versions 7 and later allow IDNs except for labels that mix scripts for different languages Labels that mix scripts are displayed in Punycode There are exceptions to locales where ASCII characters are commonly mixed with localized scripts 15 Internet Explorer 7 was capable of using IDNs but it imposes restrictions on displaying non ASCII domain names based on a user defined list of allowed languages and provides an anti phishing filter that checks suspicious Web sites against a remote database of known phishing sites citation needed Old Microsoft Edge converts all Unicode into Punycode citation needed As an additional defense Internet Explorer 7 Firefox 2 0 and above and Opera 9 10 include phishing filters that attempt to alert users when they visit malicious websites 16 17 18 As of April 2017 several browsers including Chrome Firefox and Opera were displaying IDNs consisting purely of Cyrillic characters normally not as punycode allowing spoofing attacks Chrome tightened IDN restrictions in version 59 to prevent this attack 19 20 Browser extensions like No Homo Graphs are available for Google Chrome and Firefox that check whether the user is visiting a website which is a homograph of another domain from a user defined list 21 These methods of defense only extend to within a browser Homographic URLs that house malicious software can still be distributed without being displayed as Punycode through e mail social networking or other Web sites without being detected until the user actually clicks the link While the fake link will show in Punycode when it is clicked by this point the page has already begun loading into the browser citation needed Server side registry operator mitigation edit The IDN homographs database is a Python library that allows developers to defend against this using machine learning based character recognition 22 ICANN has implemented a policy prohibiting any potential internationalized TLD from choosing letters that could resemble an existing Latin TLD and thus be used for homograph attacks Proposed IDN TLDs bg Bulgaria ukr Ukraine and el Greece have been rejected or stalled because of their perceived resemblance to Latin letters All three and Serbian srb and Mongolian mon have later been accepted 23 Three letter TLD are considered safer than two letter TLD since they are harder to match to normal Latin ISO 3166 country domains although the potential to match new generic domains remains such generic domains are far more expensive than registering a second or third level domain address making it cost prohibitive to try to register a homoglyphic TLD for the sole purpose of making fraudulent domains which itself would draw ICANN scrutiny The Russian registry operator Coordination Center for TLD RU only accepts Cyrillic names for the top level domain rf forbidding a mix with Latin or Greek characters However the problem in com and other gTLDs remains open 24 Research based mitigations edit In their 2019 study Suzuki et al introduced ShamFinder 25 a program for recognizing IDNs shedding light on their prevalence in real world scenarios Similarly Chiba et al 2019 designed DomainScouter 26 a system adept at detecting diverse homograph IDNs in domains through analyzing an estimated 4 4 million registered IDNs across 570 Top Level Domains TLDs it was able to successfully identify 8 284 IDN homographs including many previously unidentified cases targeting brands in languages other than English 27 See also editSecurity issues in Unicode Internationalized domain name Homoglyph Duplicate characters in Unicode Unicode equivalence TyposquattingNotes edit U 043E o CYRILLIC SMALL LETTER O U 03BF o GREEK SMALL LETTER OMICRON U 006F o LATIN SMALL LETTER O For example Microsfot comReferences edit a b Evgeniy Gabrilovich and Alex Gontmakher Archived copy PDF Archived from the original PDF on 2020 01 02 Retrieved 2005 12 10 a href Template Cite web html title Template Cite web cite web a CS1 maint archived copy as title link Communications of the ACM 45 2 128 February 2002 Unicode Security Considerations Technical Report 36 2010 04 28 IDN hacking disclosure by shmoo com Archived 2005 03 20 at the Wayback Machine Chrome and Firefox Phishing Attack Uses Domains Identical to Known Safe Sites Wordfence 2017 04 14 Retrieved 2017 04 18 There are various Punycode converters online such as https www hkdnr hk idn conv jsp CA takes on a French accent Canadian Internet Registration Authority CIRA Archived from the original on 2015 09 07 Retrieved 2015 09 22 Fake website URL not from KBOI TV Archived 2011 04 05 at the Wayback Machine KBOI TV Retrieved 2011 04 01 Boise TV news website targeted with Justin Bieber prank Archived 2012 03 15 at the Wayback Machine KTVB Retrieved 2011 04 01 Mimoso Michael 2017 09 06 IDN Homograph Attack Spreading Betabot Backdoor Threatpost Archived from the original on 2023 10 17 Retrieved 2020 09 20 IDN Display Algorithm Mozilla Retrieved 2016 01 31 Bug 722299 Bugzilla mozilla org Retrieved 2016 01 31 Internationalized Domain Names IDN in Google Chrome chromium googlesource com Retrieved 2020 08 26 Upcoming update with IDN homograph phishing fix Blog Opera Security 2017 04 21 Retrieved 2020 08 26 About Safari International Domain Name support Retrieved 2017 04 29 Sharif Tariq 2006 07 31 Changes to IDN in IE7 to now allow mixing of scripts IEBlog Microsoft Retrieved 2006 11 30 Sharif Tariq 2005 09 09 Phishing Filter in IE7 IEBlog Microsoft Retrieved 2006 11 30 Firefox 2 Phishing Protection Mozilla 2006 Retrieved 2006 11 30 Opera Fraud Protection Opera Software 2006 12 18 Retrieved 2007 02 24 Chrome and Firefox Phishing Attack Uses Domains Identical to Known Safe Sites Phishing with Unicode Domains No Homo Graphs em te 2018 06 28 Retrieved 2020 02 18 IDN Homographs Database GitHub 25 September 2021 IDN ccTLD Fast Track String Evaluation Completion Archived 2014 10 17 at the Wayback Machine Emoji to Zero Day Latin Homoglyphs in Domains and Subdomains Archived 2020 12 09 at the Wayback Machine Suzuki Hiroaki Chiba Daiki Yoneya Yoshiro Mori Tatsuya Goto Shigeki 2019 10 21 ShamFinder Proceedings of the Internet Measurement Conference New York NY USA ACM doi 10 1145 3355369 3355587 CHIBA Daiki AKIYAMA HASEGAWA Ayako KOIDE Takashi SAWABE Yuta GOTO Shigeki AKIYAMA Mitsuaki 2020 07 01 DomainScouter Analyzing the Risks of Deceptive Internationalized Domain Names IEICE Transactions on Information and Systems E103 D 7 1493 1511 doi 10 1587 transinf 2019icp0002 ISSN 0916 8532 Safaei Pour Morteza Nader Christelle Friday Kurt Bou Harb Elias May 2023 A Comprehensive Survey of Recent Internet Measurement Techniques for Cyber Security Computers amp Security 128 103123 doi 10 1016 j cose 2023 103123 ISSN 0167 4048 Retrieved from https en wikipedia org w index php title IDN homograph attack amp oldid 1218340721, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.