0%

Summary

We proposed a joint extraction model that can handle overlapping and nested problems. It is a single-stage model without any gap between training and inference, which means it is immune to exposure bias.

Background

Relation extraction is a technique to extract entities and relations from unstructured texts. It plays an essential role in many natrual language understanding tasks, such as text understanding, question answering (QA), and information retrieval (IR). In short, it is a typical task to extract knowledge of relations in the form of (subject, predicate, object).

For example:

1
2
3
4
5
6
7
8
9
10
{
'text': 'Gone with the Wind is a novel by American writer Margaret Mitchell',
'relation_list': [
{
'subject': 'Gone with the Wind',
'object': 'Margaret Mitchell',
'predicate': 'author'
},
]
}

Early works address this task in a pipelined manner. They first identify all candidate entities and then classify the relations between each two entities. These methods ignore the interaction between named entity recognition and relation classification and suffer from error propagation.

In order to utilize the information between two tasks, researchers proposed many models to extract entities and relations jointly. Joint learning helped improve the performance, while some early joint models, like NovelTagging, can not handle relation overlapping problems (see the table below). When entities occur in multiple relations, naive sequence labeling can not work well and usually miss some relations.

Text Triplets
Single Entity Overlapping Two of them, Jeff Francoeur and Brian McCann, are from Atlanta. (Jeff Francoeur, live in, Atlanta)(Brian McCann, live in, Atlanta)
Entity Pair Overlapping Sacramento is the capital city of the U.S. state of California. (California, contains, Sacramento)(California, capital, Sacramento)

Some models have been proposed to tackle these problems,such as CopyRE, CopyMTL, CasRel(HBT), but they have the problem of exposure bias. They use ground truth as input to guide the median training process but in inference stage they use predicted results instead, leading to a gap between training and inference.

For the decoder-based method (Figure 1), at training time, the ground truth tokens are used as context while at inference the entire sequence is generated by the resulting model on its own, and hence the previous tokens generated by the model are fed as context. As a result, the predicted tokens at training and inference are drawn from different distributions, namely, from the data distribution as opposed to the model distribution.
Figure 1. Decoder-based Model
Similarly, the decomposition-based method (Figure 2) uses the gold subject entity as specific input to guide the model extract object entities and relations during the training process while at inference the input head-entity is given by a trained model.
Figure 2. Decomposition-based Model

These models can jointly extract entities and relations by a single model, while in a way they regressed to pipelined methods since they do decoding by multiple interdependent steps. And this is the essense of why they have the exposure bias problem.

The rest sections will introduce a model that can wipe out exposure bias and guarantee the consistency of training and inference.

The tagging schema

Figure 3. Matrix tagging schema

In a matrix, we tag the link between each two tokens in a sentence. The purple tag refers to that the two correspond-ding positions are the start and end token of an entity. The red tag means that two positions are the start tokens of paired subject and object entities. The blue tag means two positions are respectively the end tokens of paired subject and obj-ect entities.

Figure 4. Handshaking tagging schema

Because the entity tail is impossible to appear before the head, to save resources, we map relation tags (red and blue) in the lower triangular region to the upper one and drop the whole lower region.

Model

Figure 5. Framework of the model

Token Pair Representation

Given a sentence, we first map each token into a low-dimensional contextual vector by a basic encoder. Then we generate a representation for the token pair by Equation 1: \[h_{i,j} = tanh(W_h \cdot [h_i; h_j] + b_h), j \geq i \tag{1}\]

Handshaking Tagger

Given a token pair representation by Equation 2, the link label of a token pair is predicted by Equation 3: \[ P(y_{i,j}) = Softmax(W_o \cdot h_{i,j} + b_o) \tag{2}\] \[ link(w_i, w_j) = argmax_l P(y_{i,j} = l) \tag{3}\]

Decoding

In the case of Figure 5, (“New”, “York”), (“New”, “City”) and (“De”, “Blasio”) are tagged as 1 in the EH-to-ET sequence, which means “New York”, “New York City”, and “De Blasio” are three entities. For relation “mayor”, (“New”, “De”) is tagged as 1 in the SH-to-OH sequence, which means the mayor of the subject starting with “New” is the object starting with “De”. (“City”, “Blasio”) is tagged as 1 in the ST-to-OT sequence, which means that the subject and object are the entities ending with “City” and “Blasio”, respectively. Based on the information represented by these three sequences, a triplet can be decoded: (“New York City”, mayor, “De Blasio”).

The same logic goes for other relations, but note that the tag 2 has an opposite meaning to the tag 1, which represents a reversal link between tokens. For example, (“York”, “Blasio”) is tagged as 2 in the ST-to-OT sequence of relation “born in”, which means “York” and “Blasio” are respectively the tail of a paired object and subject. Associated with the other two sequences, the decoded triplet should be (“De Blasio”, born in, “New York”).

Experimental results

Main results
Results on different subsets of the test data

Future work

Some points to further improve the performance:

  1. We use concatenated vectors to represent the relation between two tokens, which may not be the a good way to approach the best performance.

  2. We use the same representation to classify entities and relations. This may lead to interference between two tasks instead of mutually improvement. Two recent works demonstrated that using different representation might achieve a better performance: A Frustratingly Easy Approach, Two are Better than One

  3. The model extend the original sequence from \(O(N)\) to \(O(N^2)\), which adds the cost significantly and make it an expensive task to deal with long sequences.

More Info

If you want to know more details of this work, please see our paper or source code. Please do not hesitate to email me or open an issue if you have any questions about the code and the paper.

Logistic regression makes predictions by probability, defined as: \[p = \sigma(Wx)\]

Why sigmoid?

  1. By sigmoid function, it can output probabilities between 0 and 1.
  2. It makes the back propagation easier since it is easy to calculate the derivative of sigmoid: \(\sigma'(x) = \sigma(x)(1 - \sigma(x))\)

Objective Function

As the output \(\sigma(Wx)\) is the predicated probability of a class, given a dataset, one way to get the best parameters for the LR model is to maximize the probability of the whole dataset. So, we can use Maximum Likelihood Estimation to do the job.

The likelyhood can be defined as: \[\begin{equation} L(W) = \prod_{i=1}^n \sigma(Wx_i)^{y_i}(1-\sigma(Wx_i))^{1-y_i} \end{equation}\]

We log it for easy computation: \[\begin{equation} ln(L(W)) = \sum_{i=1}^n {y_i}ln(\sigma(Wx_i))+({1-y_i})ln(1-\sigma(Wx_i)) \end{equation}\] What we want is to maximize \(ln(L(W))\) and get the best \(W\): \[\begin{equation} W = argmax_W(ln(L(W))) \end{equation}\] Maximize \(ln(L(W))\) means minimize \(-ln(L(W))\), so we define our loss function \(J(W)\) as: \[\begin{equation} J(W) = -ln(L(W)) \end{equation}\]

Derivation

\[\begin{align} J'(W) &= \frac{\partial J}{\partial p} \cdot \frac{\partial p}{\partial W} \\ &= \frac{\partial -ln(L(W))}{\partial \sigma(Wx_i)} \cdot \frac{\partial \sigma(Wx_i)}{\partial W} \\ &= -\sum_{i=1}^n(\frac{y_i}{\sigma(Wx_i)} - \frac{1 - y_i}{1-\sigma(Wx_i)})\sigma(Wx_i)(1-\sigma(Wx_i))x_i \\ &=-\sum_{i=1}^n(y_i(1-\sigma(Wx_i)) - (1-y_i)\sigma(Wx_i))x_i \\ &=\sum_{i=1}^n(\sigma(Wx_i)-y_i)x_i \end{align}\]

Siogmoid and Tanh are mostly used non-linear funtions. Their derivatives can be represented by themself, which simplify the back propagation. This blog will show how to calculate their derivatives and the connection between them.

\[\begin{align} sigmoid(x) &= \frac{e^x}{e^x+1} \tag{1} \\ tanh(x) &= \frac{e^{2x} - 1}{e^{2x} + 1} \tag{2} \\ \end{align}\]

Sigmoid

\[\begin{align} \sigma'(x) &= \frac{e^x(e^x+1) - e^{2x}}{(e^x+1)^2}\\ &= \frac{e^x}{(e^x+1)} \cdot \frac{1}{(e^x+1)} \\ &= \sigma(x) \cdot (1 - \sigma(x)) \\ \end{align}\]

Tanh

\[\begin{align} \tanh'(x) &= \frac{2e^{2x}(e^{2x} + 1) - 2e^{2x}(e^{2x} - 1)}{(e^{2x}+1)^2}\\ &= \frac{4e^{2x}}{(e^{2x}+1)^2} \\ &= \frac{(e^{2x}+1)^2 - (e^{2x}-1)^2}{(e^{2x}+1)^2} \\ &= 1 - \frac{(e^{2x}-1)^2}{(e^{2x}+1)^2} \\ &= 1 - tanh^2(x) \\ \end{align}\]

Connection

The conclusion goes first: They have linear relationship. \[\begin{align} 1 - 2\sigma'(x) &= \frac{1 - e^x}{e^x + 1} \\ &= - \frac{e^x - 1}{e^x + 1} \\ &= - tanh(\frac{x}{2}) \end{align}\]

词性、句法分析、依存关系的符号解释

词性解释

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
CC: conjunction, coordinatin 表示连词 
CD: numeral, cardinal 表示基数词
DT: determiner 表示限定词
EX: existential there 存在句
FW: foreign word 外来词
IN: preposition or conjunction, subordinating 介词或从属连词
JJ: adjective or numeral, ordinal 形容词或序数词
JJR: adjective, comparative 形容词比较级
JJS: adjective, superlative 形容词最高级
LS: list item marker 列表标识
MD: modal auxiliary 情态助动词
NN: noun, common, singular or mass
NNS: noun, common, plural
NNP: noun, proper, singular
NNPS: noun, proper, plural
PDT: pre-determiner 前位限定词
POS: genitive marker 所有格标记
PRP: pronoun, personal 人称代词
PRP:pronoun,possessive所有格代词RB:adverb副词RBR:adverb,comparative副词比较级RBS:adverb,superlative副词最高级RP:particle小品词SYM:symbol符号TO:”to”asprepositionorinfinitivemarker作为介词或不定式标记UH:interjection插入语VB:verb,baseformVBD:verb,pasttenseVBG:verb,presentparticipleorgerundVBN:verb,pastparticipleVBP:verb,presenttense,not3rdpersonsingularVBZ:verb,presenttense,3rdpersonsingularWDT:WH−determinerWH限定词WP:WH−pronounWH代词WP: WH-pronoun, possessive WH所有格代词
WRB:Wh-adverb WH副词

中文词性标注标准:ICTPOS3.0词性标记集

句法分析(句法树)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
ROOT:要处理文本的语句 
IP:简单从句
NP:名词短语
VP:动词短语
PU:断句符,通常是句号、问号、感叹号等标点符号
LCP:方位词短语
PP:介词短语
CP:由‘的’构成的表示修饰性关系的短语
DNP:由‘的’构成的表示所属关系的短语
ADVP:副词短语
ADJP:形容词短语
DP:限定词短语
QP:量词短语
NN:常用名词
NR:固有名词:表示仅适用于该项事物的名词,含地名,人名,国名,书名,团体名称以及一事件的名称等。
NT:时间名词
PN:代词
VV:动词
VC:是
CC:表示连词
VE:有
VA:表语形容词
AS:内容标记(如:了)
VRD:动补复合词
CD: 表示基数词
DT: determiner 表示限定词
EX: existential there 存在句
FW: foreign word 外来词
IN: preposition or conjunction, subordinating 介词或从属连词
JJ: adjective or numeral, ordinal 形容词或序数词
JJR: adjective, comparative 形容词比较级
JJS: adjective, superlative 形容词最高级
LS: list item marker 列表标识
MD: modal auxiliary 情态助动词
PDT: pre-determiner 前位限定词
POS: genitive marker 所有格标记
PRP: pronoun, personal 人称代词
RB: adverb 副词
RBR: adverb, comparative 副词比较级
RBS: adverb, superlative 副词最高级
RP: particle 小品词
SYM: symbol 符号
TO:”to” as preposition or infinitive marker 作为介词或不定式标记
WDT: WH-determiner WH限定词
WP: WH-pronoun WH代词
WP$: WH-pronoun, possessive WH所有格代词
WRB:Wh-adverb WH副词

关系表示

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
abbrev: abbreviation modifier,缩写 
acomp: adjectival complement,形容词的补充;
advcl : adverbial clause modifier,状语从句修饰词
advmod: adverbial modifier状语
agent: agent,代理,一般有by的时候会出现这个
amod: adjectival modifier形容词
appos: appositional modifier,同位词
attr: attributive,属性
aux: auxiliary,非主要动词和助词,如BE,HAVE SHOULD/COULD等到
auxpass: passive auxiliary 被动词
cc: coordination,并列关系,一般取第一个词
ccomp: clausal complement从句补充
complm: complementizer,引导从句的词好重聚中的主要动词
conj : conjunct,连接两个并列的词。
cop: copula。系动词(如be,seem,appear等),(命题主词与谓词间的)连系
csubj : clausal subject,从主关系
csubjpass: clausal passive subject 主从被动关系
dep: dependent依赖关系
det: determiner决定词,如冠词等
dobj : direct object直接宾语
expl: expletive,主要是抓取there
infmod: infinitival modifier,动词不定式
iobj : indirect object,非直接宾语,也就是所以的间接宾语;
mark: marker,主要出现在有“that” or “whether”“because”, “when”,
mwe: multi-word expression,多个词的表示
neg: negation modifier否定词
nn: noun compound modifier名词组合形式
npadvmod: noun phrase as adverbial modifier名词作状语
nsubj : nominal subject,名词主语
nsubjpass: passive nominal subject,被动的名词主语
num: numeric modifier,数值修饰
number: element of compound number,组合数字
parataxis: parataxis: parataxis,并列关系
partmod: participial modifier动词形式的修饰
pcomp: prepositional complement,介词补充
pobj : object of a preposition,介词的宾语
poss: possession modifier,所有形式,所有格,所属
possessive: possessive modifier,这个表示所有者和那个’S的关系
preconj : preconjunct,常常是出现在 “either”, “both”, “neither”的情况下
predet: predeterminer,前缀决定,常常是表示所有
prep: prepositional modifier
prepc: prepositional clausal modifier
prt: phrasal verb particle,动词短语
punct: punctuation,这个很少见,但是保留下来了,结果当中不会出现这个
purpcl : purpose clause modifier,目的从句
quantmod: quantifier phrase modifier,数量短语
rcmod: relative clause modifier相关关系
ref : referent,指示物,指代
rel : relative
root: root,最重要的词,从它开始,根节点
tmod: temporal modifier
xcomp: open clausal complement
xsubj : controlling subject 掌控者

pip is a well-known management system people who have fun with Python use to install and manage packages. You might want to install a third-party package with following command:

1
pip install PACKAGE

It is convenient to use a package published by others. If you want to publish your code, here is the easy way to do:

Sign Up an Account on PyPi

It not difficult to register on PyPi. Remember your username and password. You will need to type it in the console when you upload your project.

Clean Your Project and Code

To make sure you have remove all the irrelevant code and redundant lines of code, e.g. print("...") . Use logging to give any necessary info if you want.

Move large files(e.g. data files) out and only give a link to your audience in README.md.

Create Necessary Files for PyPi

These files should be in the same directory where the package (the one you want to upload) is:

1
2
3
4
setup.py
setup.cfg
LICENSE.txt
README.md

setup.py

You need to give meta-data information about your project in setup.py. The easiest way to do this is to copy a template and only replace several strings. I really recommend this template, the author has left some comments about the strings you need to change.

setup.cfg

Create a new file called “setup.cfg”. You can specify your description file if you have:

1
2
[metadata]
description-file = README.md

LICENSE.txt

Use this file to define all license details.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
MIT License
Copyright (c) 2018 YOUR NAME
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

README.md

Give all the info your audience should know, including: 1. How to install 2. What are the features 3. Examples for using 4. Anything else You can easily find a template in Github.

Upload Your Project to PyPi

Upload it to Github first. Install setuptools and twine:

1
2
pip install --upgrade setuptools
pip install twine
Then you can use this lines to upload it to PyPi:
1
2
python setup.py sdist bdist_wheel
twine upload dist/*
An easy way to upload:
1
python setup.py upload 

Then we are all set, you can check it:

1
pip install PACKAGE

If you have a new version, do not forget to change the VERSION and REQUIRED in setup.py before re-upload it.

People can upgrade it in this way:

1
pip install PACKAGE --upgrade

Reference

How to upload your python package to PyPi (Chinese)