Description

E-communication in general, and social networks in particular, are increasingly playing crucial roles in everyone's life. Because of that, the analysis of textual information coming from social networks has been a popular research topic among the computational linguistics community. In this sense, very effective methods have been developed for such purpose, resulting in a better understanding on how to deal with inherent problems from such domain, such as: shortness, slang, multilinguality, multimodality, among others. Largely, this research progress can be attributed to academic competitions or dedicated tasks that seek to advance the state of the art in a particular research topic of practical relevance (see e.g., the series of events organized by IberEval and PAN).

Despite of such progress, there are still open issues that deserve further research in order to be solved or at least to better understand them. Specifically, we organize a task on the analysis of tweets from Mexican users. This task comprises two tracks that focus on dimensions that have not been considered in related academic competitions. The aim is to advance the state of the art on the non-thematic analysis of short texts written in Mexican Spanish. The two tracks focus on digital text forensics as follows: on the one hand, a track on author profiling, whose aim is to develop methods for profiling users according to non-standard dimensions; on the other hand, a track on aggressiveness detection in tweets.

These two tracks aim to push research in two core directions of computational linguistics. Firstly, the treatment of a variety of Spanish that has cultural traits that make it significantly different from peninsular Spanish. Since it is focused on Twitter posts, participants will have to deal with a very specific variation of Spanish. Secondly, the research in two dimensions of author profiling that have not been studied deeply by the community: occupation and place of residence. Most research so far has focused on age and gender, although useful, the considered dimensions are more challenging and can have a greater applicability. Thirdly, it is considered a track on aggressive text detection in Twitter. This is a critical topic, as users of social networks are exposed to threads that can put their integrity in risk. With this track we aim to push research in this topic that has not received much attention from the community.