File Name: link_USapp_forpri_CNpub_forpri
Observations:
749,636
Description:
This file determines links between file USapp_for_priority and file CNpub_for_priority. That is, we take the foreign priority claims in
the US applications, which list a claim date, a claim country, and a patent
application number, and we take the foreign priority claims in the Chinese
published patents, which also list a date, country, and an application
number.
We first attempt to
match on (1) priority date, (2) country (3) application number (after we do
some cleaning of the text field for application number for both
countries). In cases where we did not
get a match, we also look at the title of the patent and the list of
inventors. Specifically, we also
included cases where there is a match on (1) priority date, (2) country, (3) title (first 40 characters match except for minor
discrepancies, and (4) name of the first inventor.
Note a given Chinese
patent might match to multiple to US applications, if the Chinese patent and
multiple US applications cite the same foreign priority claim (or overlapping
foreign priority claims). Analogously, a
given US application might link to multiple Chinese patents.
The table at the bottom
of this file shows that for Chinese patents claiming foreign priority (outside
the US) in publication years 2005 and after, we obtain a match to a patent
application in the U.S. file, with the same foreign priority claim, in 80
percent of cases. In the paper we focus
only on Chinese patents published 2005 and after. Not every patent filed in China claiming
foreign priority outside the U.S. will be filed in the US as well. So we expect the match rate to be less than
100 percent, even apart from measurement error where we miss true matches.
Note the match rates in
the early 2000s are low, beginning with 10 percent in 2000. One issue is that the USPTO only began
publishing patents in 2000, and in the early years, not every patent
application was published.
Variables
Variable |
Type |
Len |
Columns in Ascii File |
Description |
app_numfix |
Char |
12 |
1-12 |
link variable to Chinese patent (unique index of CNpub_basedat) |
Country2 |
Char |
2 |
13-14 |
Country code of foreign priority claim |
for_pri_rec_trun |
Char |
40 |
15-54 |
truncated record of foreign priority claim in Chinese patent data used for matcing |
for_pri_index_CN |
Num |
8 |
55-62 |
Corresponds to for_pri_index in file CNpub_for_priority |
app_publication_num |
Char |
20 |
63-82 |
link to US application (along with app_publication_kind). app_publication_num*application_kind is unique index of USapp_basedate |
app_publication_kind |
Char |
3 |
83-85 |
see above |
priority_app_num |
Char |
30 |
86-115 |
priority application number as listed in the US file |
priority_index |
Num |
8 |
116-123 |
priority_index variable in USapp_for_priority |
for_priority_date |
Char |
10 |
124-133 |
date of foreign priority claim |
Table Showing Distribution of Matches by
Year of Chinese Patent Publication
We start with
the 1,080,629 foreign priority claims in CNpub_for_priority
and select out 757,902 cases where the foreign priority claim is not from the
US. We do this because we expect that
matches to the US application file for patents claiming US for priority will be
found in file link_USapp_appnum_CNpub_forpri. (i.e. will go to the
application number of the US application, rather than the foreign priority file). We next select the 608,180 unique values of app_numfix. For each
app_numfix, we ask if the given app_numfix
has at least one match in USapp_forpri_CNpub_forpri)
and table below reports the results.
|
Counts |
Percent |
||
Not Matched |
Matched |
Not Matched |
Matched |
|
All |
221,198 |
386,982 |
36.37 |
63.63 |
pub_year |
88,671 |
3,084 |
96.64 |
3.36 |
1999 and earlier |
||||
2000 |
14,985 |
1,743 |
89.58 |
10.42 |
2001 |
15,483 |
5,797 |
72.76 |
27.24 |
2002 |
10,501 |
13,116 |
44.46 |
55.54 |
2003 |
7,695 |
23,625 |
24.57 |
75.43 |
2004 |
8,285 |
30,254 |
21.50 |
78.50 |
2005 |
11,389 |
49,287 |
18.77 |
81.23 |
2006 |
11,815 |
51,140 |
18.77 |
81.23 |
2007 |
13,157 |
56,385 |
18.92 |
81.08 |
2008 |
12,887 |
50,876 |
20.21 |
79.79 |
2009 |
14,073 |
54,740 |
20.45 |
79.55 |
2010 |
12,257 |
46,935 |
20.71 |
79.29 |