Abstract:ABSTRACT We anticipate the widespread usage of an internationalized resource identifier (IRI) 1 1. IRI is a generalization of the uniform resource identifier (URI), which is in turn a generalization of the uniform resource locator (URL). While URIs are limited to a subset of the ASCII character set, IRIs may contain characters from the universal character set (Unicode/ISO 10646). Basically, an IRI is the internationalized version of a URI or internationalized domain name (IDN) 2 2. IDN is an Internet domain name that (potentially) contains non-ASCII characters. Such domain names could contain letters with diacritics, as required by many European languages, or characters from non-Latin scripts such as Arabic or Chinese. on the web as complement to universal resource identifier (URI). IRI/IDN is composed of characters in a subset of Unicode, such that a Unicode attack 3 3. Unicode attack is caused by the coexistence of a large number of visual/semantically similar Unicode strings. On the character level, the visually similar Unicode attack is homograph attack. to IRI/IDN could happen. Hence, visually or semantically, certain phishing IRI/IDNs may show high similarity to the real ones. The potential phishing attacks based on this strategy are very likely to happen in the near future with the boosting utilization of IRI/IDN. We invented a method to detect such phishing attack. We constructed a unicode character similarity list (UC-SimList) based on char-char visual and semantic similarities and use a nondeterministic finite automaton (NFA) 4 4. NFA is a finite state machine where for each pair of state and input symbol there may be several possible next states. We can use it to recognize a string of a certain pattern. When the last input symbol is consumed the NFA accepts if and only if there is some set of transitions it could make that will take it to an accepting state. Equivalently, it rejects if no matter what choices it makes it would not end in an accepting state. to identify the potential IRI/IDN-based phishing patterns. We implemented a phishing IRI/IDN pattern generation tool, REGAP, by which phishing IRI/IDN patterns can be generated into regular expressions (RE) for phishing IRI/IDN detection. We also address how such a tool can be applied to investigations.

REGAP: A Tool for Unicode-Based Web Identity Fraud Detection

A Potential IRI based Phishing Obfuscation Strategy and Counter Measures

A Potential IRI Based Phishing Strategy

Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages

PhiUSIIL: A diverse security profile empowered phishing URL detection framework based on similarity index and incremental learning

IEIRNet: Inconsistency Exploiting Based Identity Rectification for Face Forgery Detection

The Methodology and an Application to Fight Against Unicode Attacks

Evaluation of Online Resources in Assisting Phishing Detection

Beyond the west: Revealing and bridging the gap between Western and Chinese phishing website detection

Anti-phishing based on automated individual white-list.

Analysis and prevention of AI-based phishing email attacks

Analyzing the risk and financial impact of phishing attacks using a knowledge based approach

RAIRNet: Region-Aware Identity Rectification for Face Forgery Detection

HinPhish: an Effective Phishing Detection Approach Based on Heterogeneous Information Networks

Using Automated Individual White-List to Protect Web Digital Identities

PhishSim: Aiding Phishing Website Detection with a Feature-Free Tool

PhishIntel: Toward Practical Deployment of Reference-based Phishing Detection

Detecting and measuring IDN homograph attack

Who Stole My NFT? Investigating Web3 NFT Phishing Scams on Ethereum

Web Phishing Detection Based On Page Spatial Layout Similarity

PhishReplicant: A Language Model-based Approach to Detect Generated Squatting Domain Names