Detecting Malignant TLS Servers Using Machine Learning Techniques

Sankalp Bagaria,R. Balaji,B. S. Bindhumadhava
DOI: https://doi.org/10.48550/arXiv.1705.09044
2017-05-25
Cryptography and Security
Abstract:TLS uses X.509 certificates for server authentication. A X.509 certificate is a complex document and various innocent errors may occur while creating/ using it. Also, many certificates belong to malicious websites and should be rejected by the client and those web servers should not be visited. Usually, when a client finds a certificate that is doubtful using the traditional tests, it asks for human intervention. But, looking at certificates, most people can't differentiate between malicious and non-malicious websites. Thus, once traditional certificate validation has failed, instead of asking for human intervention, we use machine learning techniques to enable a web browser to decide whether the server to which the certificate belongs to is malignant or not ie, whether the website should be visited or not. Once a certificate has been accepted in the above phase, we observe that the website may still turn out to be malicious. So, in the second phase, we download a part of the website in a sandbox without decrypting it and observe the TLS encrypted traffic (encrypted malicious data captured in a sandbox cannot harm the system). As the traffic is encrypted after Handshake is completed, traditional pattern-matching techniques cannot be employed. Thus we use flow features of the traffic along with the features used in the above first phase. We couple these features with the unencrypted TLS header information obtained during TLS Handshake and use these in a machine learning classifier to identify whether the traffic is malicious or not.
What problem does this paper attempt to address?