Abstract:Online Social Networks serve as fertile ground for harmful behavior, ranging from hate speech to the dissemination of disinformation. Malicious actors now have unprecedented freedom to misbehave, leading to severe societal unrest and dire consequences, as exemplified by events such as the Capitol assault during the US presidential election and the Antivaxx movement during the COVID-19 pandemic. Understanding online language has become more pressing than ever. While existing works predominantly focus on content analysis, we aim to shift the focus towards understanding harmful behaviors by relating content to their respective authors. Numerous novel approaches attempt to learn the stylistic features of authors in texts, but many of these approaches are constrained by small datasets or sub-optimal training losses. To overcome these limitations, we introduce the Style Transformer for Authorship Representations (STAR), trained on a large corpus derived from public sources of 4.5 x 10^6 authored texts involving 70k heterogeneous authors. Our model leverages Supervised Contrastive Loss to teach the model to minimize the distance between texts authored by the same individual. This author pretext pre-training task yields competitive performance at zero-shot with PAN challenges on attribution and clustering. Additionally, we attain promising results on PAN verification challenges using a single dense layer, with our model serving as an embedding encoder. Finally, we present results from our test partition on Reddit. Using a support base of 8 documents of 512 tokens, we can discern authors from sets of up to 1616 authors with at least 80\% accuracy. We share our pre-trained model at huggingface (<a class="link-external link-https" href="https://huggingface.co/AIDA-UPM/star" rel="external noopener nofollow">this https URL</a>) and our code is available at (<a class="link-external link-https" href="https://github.com/jahuerta92/star" rel="external noopener nofollow">this https URL</a>)

Title-and-Tag Contrastive Vision-and-Language Transformer for Social Media Popularity Prediction

User-Aware Prefix-Tuning is a Good Learner for Personalized Image Captioning

Tag2Text: Guiding Vision-Language Model via Image Tagging

Transformer with token attention and attribute prediction for image captioning

VAuLT: Augmenting the Vision-and-Language Transformer for Sentiment Classification on Social Media

Revisiting Vision-Language Features Adaptation and Inconsistency for Social Media Popularity Prediction

Vision Transformer with Super Token Sampling

A Survey of Visual Transformers

Contrastive Learning for Implicit Social Factors in Social Media Popularity Prediction

A Multimodal Transformer for Live Streaming Highlight Prediction

Visual contextual relationship augmented transformer for image captioning

SMP Challenge: An Overview and Analysis of Social Media Prediction Challenge

On the Consensus of Synchronous Temporal and Spatial Views: A Novel Multimodal Deep Learning Method for Social Video Prediction

Neighborhood Contrastive Transformer for Change Captioning

SMART: Syntax-Calibrated Multi-Aspect Relation Transformer for Change Captioning.

GPT-4V(ision) as A Social Media Analysis Engine

Sequential Prediction of Social Media Popularity with Deep Temporal Context Networks

Vision Transformers: From Semantic Segmentation to Dense Prediction

Comparative study of Transformer and LSTM Network with attention mechanism on Image Captioning

Understanding writing style in social media with a supervised contrastively pre-trained transformer

Neural Visual Social Comment on Image-Text Content