Home | Events |

The squinting linguist meets hyperdimensional computing: Explicitly encoded high-dimensional semantic spaces used for authorship profiling

Upcoming talk by Jussi Karlgren, KTH Royal Institute of Technology and University of Helsinki.
The squinting linguist meets hyperdimensional computing: Explicitly encoded high-dimensional semantic spaces used for authorship profiling

Jussi Karlgren

May 09, 2018
from 11:00 AM to 12:30 PM

273 Kerr Hall

High-dimensional distributed semantic spaces have proven useful and effective for aggregating and processing visual and lexical information for many tasks related to human-generated data. Human language is a good example of heterogenous data: it has a large and varying number of features, both lexical items and constructions, which interact to represent various aspects of communicative information, and the examples given are primarily on textual data, and are a straightforward extension of models previously used to handle word semantics. A hyperdimensional model is able to represent a broad range of linguistic features in a common integral framework which is suitable as a bridge between symbolic and continuous representations, as an encoding scheme for symbolic information. This talk will give an overview of how high-dimensional semantic spaces have proven useful and effective for aggregating and processing lexical information for many language processing tasks and how that approach can be extended to situational data. It will provide an overview of the framework and an example of how it is used in an experimental study on authorship profiling which has some very task specific challenges for the practically minded linguist.

 

Jussi Karlgren is an adjunct professor of language technology at KTH, adjoin professor of language technology at Helsinki University, and a founding partner of the text analysis company Gavagai, based in Stockholm, Sweden. Currently, the academic year 2017‐18 he is a visiting scholar in linguistics at Stanford. He has worked with research and development in information access‐related language technology since 1987 at various research labs, mostly in industrial settings.