Data and Evaluation
For both tracks, we will split data into training and testing partitions. The former will be used by participants for developing their methods, and latter will be used to determine the winners of the challenge. For ranking participants we will use the f1 measure: macro average f1 measure for the author profiling track and the f1 on the aggressive class for the other track.
​
​
Training corpus
​
Evaluation rules
​
​
​
​
​
​
​
​
​
​
​
​
Output submission
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
Paper submission
Participants of the tasks will be given the opportunity to write a paper that describes their system, resources used, results, and analysis that will be part of the official IberEval-2018 proceedings. The paper is to be FOUR pages long plus two pages at most for references, and are required to be formatted in the Springer LNCS format (see http://www.springer.de/comp/lncs/authors.html).
Papers must be written in English.
The performance of your author profiling solution will be ranked by the average of the f1 measure of the residence and the occupation dimensions. We will use the macro average f1 measure for both dimensions.
​
The performance of your aggressive detection solution will the ranked by the f1 measure on the aggressive class.
Runs for Track 1 will be received from 23 April 0:01 until 30 April, 23:59 (-0600 UTC)
Runs for Track 2 will be received from 23 April 0:01 until 30 April, 23:59 (-0600 UTC)
Participants are allowed to submit up to two runs for each track: one primary and one secondary. The participants must clearly flag each of the two.
The training data file is password-protected; to obtain the password you first need to be registered as participant.
Submissions formatted as described below and sent via email to the account: mex.a3t@gmail.com
​
​Your software has to output for each task of the dataset a corresponding txt file. The file must contain one line per classified instance. Each line looks like this:
"TaskName"\t"IdentifierOfAnInstance"\t"Class"\n
It's important to respect the format with the " character, \t (tabulator) and \n (linux enter). The naming of the output files is up to you, we recommend to use the author and a run's identifier as filename with "txt" as extension.
For the aggressiveness track the possible labels are:
​
-
TaskName: aggressiveness
-
IdentifierOfAnInstance: tweet-NumberOfLine
-
where NumberOfLine is the number line of the each tweet in the test file.
-
-
Class: {0, 1}
-
Output example:
​
"aggressiveness" "tweet-1" "1"
"aggressiveness" "tweet-2" "0"
"aggressiveness" "tweet-3" "0"
"aggressiveness" "tweet-4" "1"
"aggressiveness" "tweet-5" "0"
​
​​
For the author profiling track we need two different files and the possible labels are:
1. For the location identification task:
-
TaskName: location
-
IdentifierOfAnInstance: Name of the classified file (even the extension)
-
Class: {northwest, north, northeast, west, center, southeast}
-
Output example:
​
"location" "36ef4c1a63d30b5563502e305303ddcd.txt" "center"
"location" "c99417659c53cf9274a67b63e232a300.txt" "center"
"location" "42e78514662b69b682828ea292e937c1.txt" "northeast"
"location" "4afece969c3d4db458a1b07502c98c09.txt" "center"
"location" "d3ce2c105b723a76bc16d7fc2220c3ea.txt" "southeast"
2. For the occupation identification task:
-
TaskName: occupation
-
IdentifierOfAnInstance: Name of the classified file (even the extension)
-
Class: {administrative, arts, health, sciences, others, social, sports, student}
-
Output example:
​
"occupation" "0c387347349d18ecedfe438d0bfb50b1.txt" "social"
"occupation" "3715c379cdbb936b42531c47b8b72d30.txt" "arts"
"occupation" "8c3f774931fb10ad8eca0c9d241136a2.txt" "administrative"
"occupation" "573325dfb5f374b27379262d017e66aa.txt" "arts"
"occupation" "8d36d5b62c3492a130e10cf267396e20.txt" "health"
​
​
A submission failing the format checking will be considered null.
​