Faceted Text Segmentation via Multitask Learning.

Text segmentation is a fundamental step in natural language processing (NLP) and information retrieval (IR) tasks. Most existing approaches do not explicitly take into account the facet information of documents for segmentation. Text segmentation and facet annotation are often addressed as separate problems, but they operate in a common input space. This article proposes FTS, which is a novel model for faceted text segmentation via multitask learning (MTL). FTS models faceted text segmentation as an MTL problem with text segmentation and facet annotation. This model employs the bidirectional long short-term memory (Bi-LSTM) network to learn the feature representation of sentences within a document. The feature representation is shared and adjusted with common parameters by MTL, which can help an optimization model to learn a better-shared and robust feature representation from text segmentation to facet annotation. Moreover, the text segmentation is modeled as a sequence tagging task using LSTM with a conditional random fields (CRFs) classification layer. Extensive experiments are conducted on five data sets from five domains: data structure, data mining, computer network, solid mechanics, and crystallography. The results indicate that the FTS model outperforms several highly cited and state-of-the-art approaches related to text segmentation and facet annotation.