Product-Item, Economic Complexity, Country Competitiveness, Hidalgo
Data From
Curated version,
Below we create a model that represents a country’s economy based on the products it produces, and the complexity of those products. “Complexity” of both the country and products it produces will essentially be based on the number of different product types, but the chicken-egg aspect of the deeper model will be teased out of the product export raw data using the method below.
Model
The complexity of an economy is proportional to the average complexity of its products, and, vice versa, the complexity of a product is proportional to the average complexity of its producers.
We could say $m_{ij}=1$ if country $i$ makes product $j$ , and $m_{ij}=0$, otherwise, but we need a preprocessing stage first. Let’s say $X_{ij}$ is exports (in dollar amounts) of product $j$ of country $i$. The Revealed Comparative Advantage of that country $i$ is
\[RCA_{ij} = \frac{X_{ij}}{\sum_i X_{ij}} / \frac{\sum_j X_{ij}}{\sum_{i,j} X_{i,j}}\]Then if $RCA_{ij} > 1.0$ we set $m_{ij}=1$, 0 otherwise.
The weights are $v_{ij} = m_{ij} / d_i$, $w_{ij}=m_{ij}/u_j$ where the diversification of country $i$ and the ubiquity of product $j$ are simply $d_i = \sum_j m_{ij}$, $u_j = \sum_i m_{ij}$. So if $c_i$ is the complexity of country $i$ and $p_j$ is the complexity of product $j$
\[c_i = \alpha \sum_j v_{ij}p_j\] \[p_j = \beta \sum_i w_{ij} c_i\]where $\alpha,\beta>0$. We see the chicken-egg problem here. We collect variables inside matrices $c$, $p$, $V=[v_{ij}]$ and $W=[w_{ij}]$, then $c = \alpha V p$ and $p = \beta W c$. If we subtitute second the latter in the former, $c = \alpha \beta (V^T W) c$, or the former in the latter, $p = \alpha \beta (V W^T) p$. This means the complexities of countries and products are given by an eigenvector of $V^T W$ and $V W^T$ respectively.
Code
Looking only at 2014 trade data.
import pandas as pd, zipfile
with zipfile.ZipFile('hidalgo.zip', 'r') as z:
df = pd.read_csv(z.open('hidalgo.csv'),sep='\t')
gdp = pd.read_csv(z.open('gdp1416.csv'),sep=',',index_col=0)
hs = pd.read_csv(z.open('hs.csv'),sep='|')
hs2 = pd.read_csv(z.open('hs2.csv'),sep=',',index_col='ProductCode_x')
print len(df)
print df.tail(10)
726013
year origin hs92 export_val import_val export_rca import_rca
726003 2014 ven 961610 39395.0 2026297.0 0.011 0.947
726004 2014 ven 961620 NaN 1084958.0 NaN 2.413
726005 2014 ven 961700 29666.0 1701096.0 0.005 0.495
726006 2014 ven 961800 2066.0 113839.0 0.001 0.074
726007 2014 ven 970110 210867.0 385141.0 0.004 0.014
726008 2014 ven 970190 179993.0 118881.0 0.136 0.155
726009 2014 ven 970200 976805.0 NaN 0.563 NaN
726010 2014 ven 970300 717009.0 277338.0 0.068 0.045
726011 2014 ven 970500 12723.0 NaN 0.004 NaN
726012 2014 ven 970600 NaN 2484.0 NaN 0.000
cp = df.pivot_table('export_val', index='origin', columns='hs92')
print cp.shape
print len(np.unique(df.hs92)), 'products'
(220, 4858)
4858 products
denom = cp.sum(axis=1) / cp.sum().sum()
denom = cp.sum(axis=1) / cp.sum().sum()
cp2 = cp.div(cp.sum(axis=0).T)
cp2 = cp2.div(denom,axis=0)
cp2 = cp2.fillna(0)
cp2[cp2 > 1.0] = 1.0
cp2[cp2 != 1.0] = 0.0
cp3 = cp2
cp4 = cp3.div(cp3.sum(axis=1),axis=0)
cp5 = cp3.div(cp3.sum(axis=0),axis=1)
print cp4.shape, cp5.shape
(220, 4858) (220, 4858)
Country, Product Complexity Method using Eigenanalysis
Country ECI
import scipy.linalg as lin
print cp4.shape
uc,vc = lin.eig(np.dot(cp4,cp5.T))
print vc.shape
eci = np.array(vc)[:,1]
print len(eci)
print np.argmax(eci)
top_countries = cp.index[np.argsort(eci)[:10]]
print top_countries
(220, 4858)
(220, 220)
220
181
Index([u'jpn', u'che', u'deu', u'kor', u'swe', u'xxb', u'usa', u'sgp', u'cze',
u'fin'],
dtype='object', name=u'origin')
Look at simple product sum, is the list the same?
Product PCI
Utilize sparsity,
import scipy.sparse.linalg as lin
import scipy.sparse as sps
scp4 = sps.lil_matrix(cp4)
scp5 = sps.lil_matrix(cp5)
A = scp4.T.dot(scp5)
up,vp = lin.eigs(A,k=2)
pci = np.array(vp)[:,1]
top_prods = cp.columns[np.argsort(pci)[:10]]
print top_prods
pd.set_option('expand_frame_repr', False)
top_prods2 = [str(x) for x in list(top_prods)]
print hs2.ix[top_prods2][['Product Description_y','Product Description_x']]
Int64Index([848590, 870810, 848390, 841221, 852610, 840999, 847790, 847990,
390940, 851410],
dtype='int64', name=u'hs92')
Product Description_y Product Description_x
ProductCode_x
848590 (-2006) Machinery parts not specified or inclu... (-2006) - Other
870810 Parts and accessories of the motor vehicles of... - Bumpers and parts thereof
848390 Transmission shafts (including cam shafts and ... -Toothed wheels, chain sprockets and other tra...
841221 Other engines and motors. -- Linear acting (cylinders)
852610 Radar apparatus, radio navigational aid appara... - Radar apparatus
840999 Parts suitable for use solely or principally w... -- Other
847790 Machinery for working rubber or plastics or fo... - Parts
847990 Machines and mechanical appliances having indi... - Parts
390940 Amino-resins, phenolic resins and polyurethane... - Phenolic resins
851410 Industrial or laboratory electric furnaces and... - Resistance heated furnaces and ovens
Simple regression
cindex = [x.upper() for x in cp.index]
ecigdp = pd.DataFrame(eci,index=cindex)
ecigdp = ecigdp.join(gdp)
print ecigdp.shape
ecigdp.columns = ['eci', u'gdp2014', u'gdp2016']
ecigdp['prods'] = np.array(cp3.sum(axis=1))
ecigdp = ecigdp.dropna()
print ecigdp.tail()
import statsmodels.formula.api as smf
results = smf.ols('np.log(gdp2014) ~ eci', data=ecigdp).fit()
print results.rsquared_adj
results = smf.ols('np.log(gdp2014) ~ prods', data=ecigdp).fit()
print results.rsquared_adj
(220, 3)
eci gdp2014 gdp2016 prods
WSM 0.025062 3761.912686 3524.649880 209.0
YEM 0.075479 679.667360 1101.117444 147.0
ZAF 0.008537 7504.295250 7627.851926 742.0
ZMB 0.048409 1622.409958 1620.823290 182.0
ZWE 0.063000 908.829980 932.548383 275.0
0.55503440264
0.230701679034
plt.plot(ecigdp.eci,np.log(ecigdp.gdp2014),'.')
plt.savefig('eci_01.png')
HS conversion
hs2['code4'] = hs.ProductCode.str.slice(0,4)
hs3 = hs2[(hs2.ProductCode.str.len()==4)]
hs4 = hs2.merge(hs3, how='left', left_on='code4', right_on='ProductCode')
hs4 = hs4[['ProductCode_x','Product Description_x','Product Description_y']]
GDP download
import pandas as pd
from pandas_datareader import data, wb
countries = ['ABW', 'AFG', 'AGO', 'AIA', 'ALB', 'AND', 'ARE', 'ARG', 'ARM', 'ASM', 'ATF', 'ATG', 'AUS', 'AUT', 'AZE', 'BDI', 'BEN', 'BES', 'BFA', 'BGD', 'BGR', 'BHR', 'BHS', 'BIH', 'BLR', 'BLX', 'BLZ', 'BMU', 'BOL', 'BRA', 'BRB', 'BRN', 'BTN', 'CAF', 'CAN', 'CCK', 'CHE', 'CHL', 'CHN', 'CIV', 'CMR', 'COD', 'COG', 'COK', 'COL', 'COM', 'CPV', 'CRI', 'CUB', 'CUW', 'CXR', 'CYM', 'CYP', 'CZE', 'DEU', 'DJI', 'DMA', 'DNK', 'DOM', 'DZA', 'ECU', 'EGY', 'ERI', 'ESP', 'EST', 'ETH', 'FIN', 'FJI', 'FLK', 'FRA', 'FSM', 'GAB', 'GBR', 'GEO', 'GHA', 'GIB', 'GIN', 'GMB', 'GNB', 'GNQ', 'GRC', 'GRD', 'GRL', 'GTM', 'GUM', 'GUY', 'HKG', 'HND', 'HRV', 'HTI', 'HUN', 'IDN', 'IND', 'IOT', 'IRL', 'IRN', 'IRQ', 'ISL', 'ISR', 'ITA', 'JAM', 'JOR', 'JPN', 'KAZ', 'KEN', 'KGZ', 'KHM', 'KIR', 'KNA', 'KOR', 'KWT', 'LAO', 'LBN', 'LBR', 'LBY', 'LCA', 'LKA', 'LTU', 'LVA', 'MAC', 'MAF', 'MAR', 'MDA', 'MDG', 'MDV', 'MEX', 'MHL', 'MKD', 'MLI', 'MLT', 'MMR', 'MNE', 'MNG', 'MNP', 'MOZ', 'MRT', 'MSR', 'MUS', 'MWI', 'MYS', 'NCL', 'NER', 'NFK', 'NGA', 'NIC', 'NIU', 'NLD', 'NOR', 'NPL', 'NRU', 'NZL', 'OMN', 'PAK', 'PAN', 'PCN', 'PER', 'PHL', 'PLW', 'PNG', 'POL', 'PRK', 'PRT', 'PRY', 'PSE', 'PYF', 'QAT', 'ROU', 'RUS', 'RWA', 'SAU', 'SDN', 'SEN', 'SGP', 'SHN', 'SLB', 'SLE', 'SLV', 'SMR', 'SOM', 'SPM', 'SRB', 'SSD', 'STP', 'SUR', 'SVK', 'SVN', 'SWE', 'SYC', 'SYR', 'TCA', 'TCD', 'TGO', 'THA', 'TJK', 'TKL', 'TKM', 'TLS', 'TON', 'TTO', 'TUN', 'TUR', 'TUV', 'TZA', 'UGA', 'UKR', 'URY', 'USA', 'UZB', 'VCT', 'VEN', 'VGB', 'VNM', 'VUT', 'WLF', 'WSM', 'XXB', 'YEM', 'ZAF', 'ZMB', 'ZWE']
res = []
for i,c in enumerate(countries):
try:
print i,c
dat = wb.download(indicator='NY.GDP.PCAP.KD', country=[c], start=2014, end=2016)
gdp = list(dat['NY.GDP.PCAP.KD'])
res.append([c, gdp[0], gdp[2]])
#if len(res) > 5: break
except:
print 'error'
df = pd.DataFrame(res)
df.columns = ['code','gdp2014','gdp2016']
df.to_csv('gdp1416.csv',index=None)
References
Inoua, Simple Measure of Economic Complexity
Hidalgo The Atlas of Economic Complexity