欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

从零到实践《知乎"看山杯"第一名 init 队解决方案(PyTorch)》

程序员文章站 2024-02-28 19:20:40
...

        首先我是一名JAVA开发者,对Python了解较少,最近工作需要对大量文本进行分析整理,然后就开始从网上找资料,从知乎渠道了解到知乎举办的看山杯比赛,找到了冠军init队的解决方案,便开始了尝试。我的思路可能是错误的。

事实证明:机器学习需要带GPU的大内存linux系统,虚拟机安装的系统无法计算。

        首先linux系统需要64位,我使用虚拟机安装了linux系统。

        虚拟机版本:VMware-workstation-full-14.1.1-7528167.exe,14pro版本

        linux系统版本:CentOS-7-x86_64-DVD-1708.iso  4G左右 系统是从阿里云镜像站下载的

        写这个博客的目的是记录我操作过程中步骤及问题解决办法。因为是一遍操作一遍记录,所以篇幅可能没有排版,后面做完之后会进行整理排版,另外本人也是一个相对完美主义者。

        1,安装完系统后,内存设置为2G,硬盘:40G,因为电脑配置低,没有满足init解决方案的最低配置,但抱着尝试的态度去尝试。现在无法知道最后是否能完成,虚拟机默认为NET网卡,开启系统后,在虚拟机中操作ifconfig,(如果安装的是简单系统版本是没有这个命令的),没有ip地址。也无法ping通baidu.com。首先要可以与主机网络互通,所以执行以下操作:

[[email protected] ~]# vi /etc/sysconfig/network-scripts/ifcfg-ens33 

打开后编辑

ONBOOT为yes   ---刚开始为no
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=dhcp
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=ens33
UUID=763f71d6-ec83-4bee-8322-4c903f6b78ed
DEVICE=ens33
ONBOOT=yes

        编辑完成后,保存退出,重启网络或者重启系统,这里由于固定IP操作比较麻烦,所以未做这一步。

        重启完成后,我使用XSHELL进行连接操作。该软件视图清晰,而且容易复制。在虚拟机中操作不可以复制粘贴命令,不太方便。连接之后,执行命令。表示可以联网了。

[[email protected] ~]# ping baidu.com
PING baidu.com (123.125.115.110) 56(84) bytes of data.
64 bytes from 123.125.115.110 (123.125.115.110): icmp_seq=1 ttl=128 time=130 ms   

        接下来可能要安装软件,所以看下yum是否可以操作,如下命令表示可以使用yum

[[email protected] ~]# yum install unzip
已加载插件:fastestmirror
base   

        系统自带Python,因为方案是用的Python2.7,所以无需再重新安装

[[email protected] ~]# python
Python 2.7.5 (default, Aug  4 2017, 00:39:18) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 

        安装pip和wheel,setuptools

[[email protected] /]# mkdir weblogic
[[email protected] /]# ls
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var  weblogic
[[email protected] /]# cd weblogic/
[[email protected] weblogic]# curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1603k  100 1603k    0     0   465k      0  0:00:03  0:00:03 --:--:--  465k
[[email protected] weblogic]# python get-pip.py
Collecting pip
  Downloading https://files.pythonhosted.org/packages/0f/74/ecd13431bcc456ed390b44c8a6e917c1820365cbebcb6a8974d1cd045ab4/pip-10.0.1-py2.py3-none-any.whl (1.3MB)
    100% |████████████████████████████████| 1.3MB 294kB/s 
Collecting setuptools
  Downloading https://files.pythonhosted.org/packages/7f/e1/820d941153923aac1d49d7fc37e17b6e73bfbd2904959fffbad77900cf92/setuptools-39.2.0-py2.py3-none-any.whl (567kB)
    100% |████████████████████████████████| 573kB 406kB/s 
Collecting wheel
  Downloading https://files.pythonhosted.org/packages/81/30/e935244ca6165187ae8be876b6316ae201b71485538ffac1d718843025a9/wheel-0.31.1-py2.py3-none-any.whl (41kB)
    100% |████████████████████████████████| 51kB 729kB/s 
Installing collected packages: pip, setuptools, wheel
Successfully installed pip-10.0.1 setuptools-39.2.0 wheel-0.31.1
[[email protected] weblogic]# 
[[email protected] weblogic]# ls
get-pip.py  ipdb-0.11.tar.gz  pip-10.0.1.tar.gz  setuptools-39.2.0  setuptools-39.2.0.zip  torch-0.1.12.post2-cp27-none-linux_x86_64.whl  wheel-0.31.1  wheel-0.31.1.tar.gz
[[email protected] weblogic]# cd wheel-0.31.1
[[email protected] wheel-0.31.1]# python setup.py install

安装(PyTorch)

[[email protected] weblogic]# pip install torch-0.1.12.post2-cp27-none-linux_x86_64.whl 
Processing ./torch-0.1.12.post2-cp27-none-linux_x86_64.whl

安装GIT 

[[email protected] PyTorchText-master]# yum install curl-devel expat-devel gettext-devel openssl-devel zlib-devel gcc perl-ExtUtils-MakeMaker
已加载插件:fastestmirror
Loading mirror speeds from cached hostfile

 下载git安装包

    wget https://www.kernel.org/pub/software/scm/git/git-2.8.3.tar.gz

  解压git安装包

    tar -zxvf git-2.8.3.tar.gz

    cd git-2.8.3

[[email protected] git-2.8.3]# pwd
/weblogic/git-2.8.3
[[email protected] git-2.8.3]# gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)
Copyright © 2015 Free Software Foundation, Inc.
本程序是*软件;请参看源代码的版权声明。本软件没有任何担保;
包括没有适销性和某一专用目的下的适用性担保。
[[email protected] git-2.8.3]# ./configure prefix=/usr/local/git/
configure: Setting lib to 'lib' (the default)
./check_bindir "z$bindir" "z$execdir" "$bindir/git-add"
[[email protected] git-2.8.3]# git --version
git version 1.8.3.1
[[email protected] git-2.8.3]# 
[[email protected] PyTorchText-master]# pip install Cython
Collecting Cython
  Downloading https://files.pythonhosted.org/packages/f6/23/ef5521e077e9e7ef8e4603e27713ae95fee69e9c19c7cd036b4299c7ced5/Cython-0.28.3-cp27-cp27mu-manylinux1_x86_64.whl (3.3MB)
    100% |████████████████████████████████| 3.3MB 486kB/s 
Installing collected packages: Cython
Successfully installed Cython-0.28.3
[[email protected] PyTorchText-master]# 

安装fasttext时,如果用pip会报错, 

ImportError: No module named Cython.Build

解决方案如下:

pip install Cython

pip install fasttext   ---这个安装报错了。信息如下

[[email protected] PyTorchText-master]# pip install fasttext
Collecting fasttext
  Downloading https://files.pythonhosted.org/packages/a4/86/ff826211bc9e28d4c371668b30b4b2c38a09127e5e73017b1c0cd52f9dfa/fasttext-0.8.3.tar.gz (73kB)
    100% |████████████████████████████████| 81kB 315kB/s 
Collecting numpy>=1 (from fasttext)
  Downloading https://files.pythonhosted.org/packages/c0/e7/08f059a00367fd613e4f2875a16c70b6237268a1d6d166c6d36acada8301/numpy-1.14.3-cp27-cp27mu-manylinux1_x86_64.whl (12.1MB)
    100% |████████████████████████████████| 12.1MB 392kB/s 
Collecting future (from fasttext)
  Downloading https://files.pythonhosted.org/packages/00/2b/8d082ddfed935f3608cc61140df6dcbf0edea1bc3ab52fb6c29ae3e81e85/future-0.16.0.tar.gz (824kB)
    100% |████████████████████████████████| 829kB 441kB/s 
Building wheels for collected packages: fasttext, future
  Running setup.py bdist_wheel for fasttext ... error
  Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-DZjW32/fasttext/setup.py';f=getattr(tokenize, 'open', open)(__file__);
code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/pip-wheel-KodiTL --python-tag cp27:  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-2.7
  creating build/lib.linux-x86_64-2.7/fasttext
  copying fasttext/__init__.py -> build/lib.linux-x86_64-2.7/fasttext
  copying fasttext/model.py -> build/lib.linux-x86_64-2.7/fasttext
  running build_ext
  building 'fasttext.fasttext' extension
  creating build/temp.linux-x86_64-2.7
  creating build/temp.linux-x86_64-2.7/fasttext
  creating build/temp.linux-x86_64-2.7/fasttext/cpp
  creating build/temp.linux-x86_64-2.7/fasttext/cpp/src
  gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=ge
neric -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I./fasttext -I/usr/include/python2.7 -c fasttext/fasttext.cpp -o build/temp.linux-x86_64-2.7/fasttext/fasttext.o -O3 -pthread -funroll-loops -std=c++0x  gcc: error trying to exec 'cc1plus': execvp: 没有那个文件或目录
  error: command 'gcc' failed with exit status 1
  
  ----------------------------------------
  Failed building wheel for fasttext
  Running setup.py clean for fasttext
  Running setup.py bdist_wheel for future ... done
  Stored in directory: /root/.cache/pip/wheels/bf/c9/a3/c538d90ef17cf7823fa51fc701a7a7a910a80f6a405bf15b1a
Successfully built future
Failed to build fasttext
Installing collected packages: numpy, future, fasttext
  Running setup.py install for fasttext ... error
    Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-DZjW32/fasttext/setup.py';f=getattr(tokenize, 'open', open)(__file__
);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-VyEfve/install-record.txt --single-version-externally-managed --compile:    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-2.7
    creating build/lib.linux-x86_64-2.7/fasttext
    copying fasttext/__init__.py -> build/lib.linux-x86_64-2.7/fasttext
    copying fasttext/model.py -> build/lib.linux-x86_64-2.7/fasttext
    running build_ext
    building 'fasttext.fasttext' extension
    creating build/temp.linux-x86_64-2.7
    creating build/temp.linux-x86_64-2.7/fasttext
    creating build/temp.linux-x86_64-2.7/fasttext/cpp
    creating build/temp.linux-x86_64-2.7/fasttext/cpp/src
    gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=
generic -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I./fasttext -I/usr/include/python2.7 -c fasttext/fasttext.cpp -o build/temp.linux-x86_64-2.7/fasttext/fasttext.o -O3 -pthread -funroll-loops -std=c++0x    gcc: error trying to exec 'cc1plus': execvp: 没有那个文件或目录
    error: command 'gcc' failed with exit status 1
    
    ----------------------------------------
Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-DZjW32/fasttext/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace(
'\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-VyEfve/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-install-DZjW32/fasttext/
安装TensorFlow

pip install -r requirements.txt

[[email protected] PyTorchText-master]# pip install -r requirements.txt 
Collecting git+https://github.com/pytorch/[email protected] (from -r requirements.txt (line 5))
  Cloning https://github.com/pytorch/tnt.git (to revision master) to /tmp/pip-req-build-E_05vl
Collecting ipdb (from -r requirements.txt (line 1))
Collecting fire (from -r requirements.txt (line 2))
Collecting tqdm (from -r requirements.txt (line 3))
  Using cached https://files.pythonhosted.org/packages/93/24/6ab1df969db228aed36a648a8959d1027099ce45fad67532b9673d533318/tqdm-4.23.4-py2.py3-none-any.whl
Collecting visdom (from -r requirements.txt (line 4))
Collecting word2vec (from -r requirements.txt (line 6))
  Using cached https://files.pythonhosted.org/packages/5b/33/8e1cf93216342f0fe8aa4484ef1a833a12c4f6d6bf8e8b46ecc0feb5e5e8/word2vec-0.9.2.tar.gz
Requirement already satisfied: torch in /usr/lib64/python2.7/site-packages (from torchnet==0.0.2->-r requirements.txt (line 5)) (0.1.12.post2)
Requirement already satisfied: six in /usr/lib/python2.7/site-packages (from torchnet==0.0.2->-r requirements.txt (line 5)) (1.11.0)
Collecting ipython<6.0.0,>=5.0.0; python_version == "2.7" (from ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/52/19/aadde98d6bde1667d0bf431fb2d22451f880aaa373e0a241c7e7cb5815a0/ipython-5.7.0-py2-none-any.whl
Requirement already satisfied: setuptools in /usr/lib/python2.7/site-packages (from ipdb->-r requirements.txt (line 1)) (39.2.0)
Collecting torchfile (from visdom->-r requirements.txt (line 4))
Collecting pyzmq (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/5d/b0/3aea046f5519e2e059a225e8c924f897846b608793f890be987d07858b7c/pyzmq-17.0.0-cp27-cp27mu-manylinux1_x86_64.whl
Requirement already satisfied: numpy>=1.8 in /usr/lib64/python2.7/site-packages (from visdom->-r requirements.txt (line 4)) (1.14.3)
Collecting tornado (from visdom->-r requirements.txt (line 4))
Collecting websocket-client (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/8a/a1/72ef9aa26cfe1a75cee09fc1957e4723add9de098c15719416a1ee89386b/websocket_client-0.48.0-py2.py3-none-any.whl
Collecting pillow (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/00/49/a0483e7308b4b04b5a898789911dbb876d9fea54e7df0453915e47744cfd/Pillow-5.1.0-cp27-cp27mu-manylinux1_x86_64.whl
Collecting scipy (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/2a/f3/de9c1bd16311982711209edaa8c6caa962db30ebb6a8cc6f1dcd2d3ef616/scipy-1.1.0-cp27-cp27mu-manylinux1_x86_64.whl
Collecting requests (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/49/df/50aa1999ab9bde74656c2919d9c0c085fd2b3775fd3eca826012bef76d8c/requests-2.18.4-py2.py3-none-any.whl
Requirement already satisfied: cython in /usr/lib64/python2.7/site-packages (from word2vec->-r requirements.txt (line 6)) (0.28.3)
Requirement already satisfied: pyyaml in /usr/lib64/python2.7/site-packages (from torch->torchnet==0.0.2->-r requirements.txt (line 5)) (3.12)
Requirement already satisfied: prompt-toolkit<2.0.0,>=1.0.4 in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (li
ne 1)) (1.0.15)Requirement already satisfied: decorator in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1)) (3.4.0)
Requirement already satisfied: pexpect; sys_platform != "win32" in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt
 (line 1)) (4.6.0)Requirement already satisfied: backports.shutil-get-terminal-size; python_version == "2.7" in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"
->ipdb->-r requirements.txt (line 1)) (1.0.0)Requirement already satisfied: pygments in /usr/lib64/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1)) (2.2.0)
Collecting pathlib2; python_version == "2.7" or python_version == "3.3" (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/66/a7/9f8d84f31728d78beade9b1271ccbfb290c41c1e4dc13dbd4997ad594dcd/pathlib2-2.3.2-py2.py3-none-any.whl
Collecting traitlets>=4.2 (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/93/d6/abcb22de61d78e2fc3959c964628a5771e47e7cc60d53e9342e21ed6cc9a/traitlets-4.3.2-py2.py3-none-any.whl
Collecting simplegeneric>0.8 (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
Collecting pickleshare (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/9f/17/daa142fc9be6b76f26f24eeeb9a138940671490b91cb5587393f297c8317/pickleshare-0.7.4-py2.py3-none-any.whl
Collecting backports-abc>=0.4 (from tornado->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/7d/56/6f3ac1b816d0cd8994e83d0c4e55bc64567532f7dc543378bd87f81cebc7/backports_abc-0.5-py2.py3-none-any.whl
Requirement already satisfied: futures in /usr/lib/python2.7/site-packages (from tornado->visdom->-r requirements.txt (line 4)) (3.2.0)
Collecting singledispatch (from tornado->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/c5/10/369f50bcd4621b263927b0a1519987a04383d4a98fb10438042ad410cf88/singledispatch-3.4.0.3-py2.py3-none-any.whl
Collecting certifi>=2017.4.17 (from requests->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/7c/e6/92ad559b7192d846975fc916b65f667c7b8c3a32bea7372340bfe9a15fa5/certifi-2018.4.16-py2.py3-none-any.whl
Collecting chardet<3.1.0,>=3.0.2 (from requests->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl
Collecting idna<2.7,>=2.5 (from requests->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/27/cc/6dd9a3869f15c2edfab863b992838277279ce92663d334df9ecf5106f5c6/idna-2.6-py2.py3-none-any.whl
Collecting urllib3<1.23,>=1.21.1 (from requests->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/63/cb/6965947c13a94236f6d4b8223e21beb4d576dc72e8130bd7880f600839b8/urllib3-1.22-py2.py3-none-any.whl
Requirement already satisfied: wcwidth in /usr/lib/python2.7/site-packages (from prompt-toolkit<2.0.0,>=1.0.4->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirement
s.txt (line 1)) (0.1.7)Requirement already satisfied: ptyprocess>=0.5 in /usr/lib/python2.7/site-packages (from pexpect; sys_platform != "win32"->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r
 requirements.txt (line 1)) (0.5.2)Collecting scandir; python_version < "3.5" (from pathlib2; python_version == "2.7" or python_version == "3.3"->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirement
s.txt (line 1))  Using cached https://files.pythonhosted.org/packages/13/bb/e541b74230bbf7a20a3949a2ee6631be299378a784f5445aa5d0047c192b/scandir-1.7.tar.gz
Collecting ipython-genutils (from traitlets>=4.2->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/fa/bc/9bd3b5c2b4774d5f33b2d544f1460be9df7df2fe42f352135381c347c69a/ipython_genutils-0.2.0-py2.py3-none-any.whl
Requirement already satisfied: enum34; python_version == "2.7" in /usr/lib/python2.7/site-packages (from traitlets>=4.2->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r r
equirements.txt (line 1)) (1.1.6)Building wheels for collected packages: word2vec, torchnet, scandir
  Running setup.py bdist_wheel for word2vec ... error
  Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-aaHbvs/word2vec/setup.py';f=getattr(tokenize, 'open', open)(__file__);
code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/pip-wheel-0AZ5iL --python-tag cp27:  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-2.7
  creating build/lib.linux-x86_64-2.7/word2vec
  copying word2vec/__init__.py -> build/lib.linux-x86_64-2.7/word2vec
  copying word2vec/_version.py -> build/lib.linux-x86_64-2.7/word2vec
  copying word2vec/io.py -> build/lib.linux-x86_64-2.7/word2vec
  copying word2vec/scripts_interface.py -> build/lib.linux-x86_64-2.7/word2vec
  copying word2vec/utils.py -> build/lib.linux-x86_64-2.7/word2vec
  copying word2vec/wordclusters.py -> build/lib.linux-x86_64-2.7/word2vec
  copying word2vec/wordvectors.py -> build/lib.linux-x86_64-2.7/word2vec
  creating build/lib.linux-x86_64-2.7/word2vec/tests
  copying word2vec/tests/__init__.py -> build/lib.linux-x86_64-2.7/word2vec/tests
  copying word2vec/tests/test_word2vec.py -> build/lib.linux-x86_64-2.7/word2vec/tests
  UPDATING build/lib.linux-x86_64-2.7/word2vec/_version.py
  set build/lib.linux-x86_64-2.7/word2vec/_version.py to '0.9.2'
  running build_ext
  building 'word2vec.word2vec_noop' extension
  creating build/temp.linux-x86_64-2.7
  creating build/temp.linux-x86_64-2.7/word2vec
  gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=ge
neric -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/usr/include/python2.7 -c word2vec/word2vec_noop.c -o build/temp.linux-x86_64-2.7/word2vec/word2vec_noop.o  word2vec/word2vec_noop.c:16:20: 致命错误:Python.h:没有那个文件或目录
   #include "Python.h"
                      ^
  编译中断。
  error: command 'gcc' failed with exit status 1
  
  ----------------------------------------
  Failed building wheel for word2vec
  Running setup.py clean for word2vec
  Running setup.py bdist_wheel for torchnet ... done
  Stored in directory: /tmp/pip-ephem-wheel-cache-nmBFRj/wheels/17/05/ec/d05d051a225871af52bf504f5e8daf57704811b3c1850d0012
  Running setup.py bdist_wheel for scandir ... error
  Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-aaHbvs/scandir/setup.py';f=getattr(tokenize, 'open', open)(__file__);c
ode=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/pip-wheel-YBhBvd --python-tag cp27:  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-2.7
  copying scandir.py -> build/lib.linux-x86_64-2.7
  running build_ext
  building '_scandir' extension
  creating build/temp.linux-x86_64-2.7
  gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=ge
neric -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/usr/include/python2.7 -c _scandir.c -o build/temp.linux-x86_64-2.7/_scandir.o  _scandir.c:14:20: 致命错误:Python.h:没有那个文件或目录
   #include <Python.h>
                      ^
  编译中断。
  error: command 'gcc' failed with exit status 1
  
  ----------------------------------------
  Failed building wheel for scandir
  Running setup.py clean for scandir
Successfully built torchnet
Failed to build word2vec scandir
Installing collected packages: scandir, pathlib2, ipython-genutils, traitlets, simplegeneric, pickleshare, ipython, ipdb, fire, tqdm, torchfile, pyzmq, backports-abc, singledispat
ch, tornado, websocket-client, pillow, scipy, certifi, chardet, idna, urllib3, requests, visdom, word2vec, torchnet  Running setup.py install for scandir ... error
    Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-aaHbvs/scandir/setup.py';f=getattr(tokenize, 'open', open)(__file__)
;code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-GKZVrW/install-record.txt --single-version-externally-managed --compile:    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-2.7
    copying scandir.py -> build/lib.linux-x86_64-2.7
    running build_ext
    building '_scandir' extension
    creating build/temp.linux-x86_64-2.7
    gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=
generic -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/usr/include/python2.7 -c _scandir.c -o build/temp.linux-x86_64-2.7/_scandir.o    _scandir.c:14:20: 致命错误:Python.h:没有那个文件或目录
     #include <Python.h>
                        ^
    编译中断。
    error: command 'gcc' failed with exit status 1
    
    ----------------------------------------
Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-aaHbvs/scandir/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('
\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-GKZVrW/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-install-aaHbvs/scandir/[[email protected] PyTorchText-master]# 

查找问题 在Centos7上安装Python-dev 

[[email protected] PyTorchText-master]# yum install python-dev
已加载插件:fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.163.com
 * extras: mirrors.163.com
 * updates: mirrors.cn99.com
没有可用软件包 python-dev。
错误:无须任何处理
[[email protected] PyTorchText-master]# yum install Python-devel
已加载插件:fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.163.com
 * extras: mirrors.163.com
 * updates: mirrors.cn99.com
没有可用软件包 Python-devel。
  * 也许您想要:python-devel
错误:无须任何处理
[[email protected] PyTorchText-master]# yum install python-devel
已加载插件:fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.163.com
 * extras: mirrors.163.com
 * updates: mirrors.cn99.com
正在解决依赖关系
--> 正在检查事务
---> 软件包 python-devel.x86_64.0.2.7.5-68.el7 将被 安装
--> 正在处理依赖关系 python(x86-64) = 2.7.5-68.el7,它被软件包 python-devel-2.7.5-68.el7.x86_64 需要
--> 正在检查事务
---> 软件包 python.x86_64.0.2.7.5-58.el7 将被 升级
---> 软件包 python.x86_64.0.2.7.5-68.el7 将被 更新
--> 正在处理依赖关系 python-libs(x86-64) = 2.7.5-68.el7,它被软件包 python-2.7.5-68.el7.x86_64 需要
--> 正在检查事务
---> 软件包 python-libs.x86_64.0.2.7.5-58.el7 将被 升级
---> 软件包 python-libs.x86_64.0.2.7.5-68.el7 将被 更新
--> 解决依赖关系完成

依赖关系解决

===================================================================================================================================================================================
 Package                                       架构                                    版本                                            源                                     大小
===================================================================================================================================================================================
正在安装:
 python-devel                                  x86_64                                  2.7.5-68.el7                                    base                                  397 k
为依赖而更新:
 python                                        x86_64                                  2.7.5-68.el7                                    base                                   93 k
 python-libs                                   x86_64                                  2.7.5-68.el7                                    base                                  5.6 M

事务概要
===================================================================================================================================================================================
安装  1 软件包
升级           ( 2 依赖软件包)

总下载量:6.1 M
Is this ok [y/d/N]: y
Downloading packages:
Delta RPMs disabled because /usr/bin/applydeltarpm not installed.
(1/3): python-2.7.5-68.el7.x86_64.rpm                                                                                                                       |  93 kB  00:00:00     
(2/3): python-devel-2.7.5-68.el7.x86_64.rpm                                                                                                                 | 397 kB  00:00:03     
(3/3): python-libs-2.7.5-68.el7.x86_64.rpm                                                                                                                  | 5.6 MB  00:00:38     
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
总计                                                                                                                                               160 kB/s | 6.1 MB  00:00:39     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  正在更新    : python-libs-2.7.5-68.el7.x86_64                                                                                                                                1/5 
  正在更新    : python-2.7.5-68.el7.x86_64                                                                                                                                     2/5 
  正在安装    : python-devel-2.7.5-68.el7.x86_64                                                                                                                               3/5 
  清理        : python-2.7.5-58.el7.x86_64                                                                                                                                     4/5 
  清理        : python-libs-2.7.5-58.el7.x86_64                                                                                                                                5/5 
  验证中      : python-libs-2.7.5-68.el7.x86_64                                                                                                                                1/5 
  验证中      : python-devel-2.7.5-68.el7.x86_64                                                                                                                               2/5 
  验证中      : python-2.7.5-68.el7.x86_64                                                                                                                                     3/5 
  验证中      : python-libs-2.7.5-58.el7.x86_64                                                                                                                                4/5 
  验证中      : python-2.7.5-58.el7.x86_64                                                                                                                                     5/5 

已安装:
  python-devel.x86_64 0:2.7.5-68.el7                                                                                                                                               

作为依赖被升级:
  python.x86_64 0:2.7.5-68.el7                                                          python-libs.x86_64 0:2.7.5-68.el7                                                         

完毕!
[[email protected] PyTorchText-master]# 

然后执行成功

[[email protected] PyTorchText-master]# pip install -r requirements.txt 
Collecting git+https://github.com/pytorch/[email protected] (from -r requirements.txt (line 5))
  Cloning https://github.com/pytorch/tnt.git (to revision master) to /tmp/pip-req-build-kyfk8D
Collecting ipdb (from -r requirements.txt (line 1))
Collecting fire (from -r requirements.txt (line 2))
Collecting tqdm (from -r requirements.txt (line 3))
  Using cached https://files.pythonhosted.org/packages/93/24/6ab1df969db228aed36a648a8959d1027099ce45fad67532b9673d533318/tqdm-4.23.4-py2.py3-none-any.whl
Collecting visdom (from -r requirements.txt (line 4))
Collecting word2vec (from -r requirements.txt (line 6))
  Using cached https://files.pythonhosted.org/packages/5b/33/8e1cf93216342f0fe8aa4484ef1a833a12c4f6d6bf8e8b46ecc0feb5e5e8/word2vec-0.9.2.tar.gz
Requirement already satisfied: torch in /usr/lib64/python2.7/site-packages (from torchnet==0.0.2->-r requirements.txt (line 5)) (0.1.12.post2)
Requirement already satisfied: six in /usr/lib/python2.7/site-packages (from torchnet==0.0.2->-r requirements.txt (line 5)) (1.11.0)
Collecting ipython<6.0.0,>=5.0.0; python_version == "2.7" (from ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/52/19/aadde98d6bde1667d0bf431fb2d22451f880aaa373e0a241c7e7cb5815a0/ipython-5.7.0-py2-none-any.whl
Requirement already satisfied: setuptools in /usr/lib/python2.7/site-packages (from ipdb->-r requirements.txt (line 1)) (39.2.0)
Collecting torchfile (from visdom->-r requirements.txt (line 4))
Collecting tornado (from visdom->-r requirements.txt (line 4))
Collecting scipy (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/2a/f3/de9c1bd16311982711209edaa8c6caa962db30ebb6a8cc6f1dcd2d3ef616/scipy-1.1.0-cp27-cp27mu-manylinux1_x86_64.whl
Requirement already satisfied: numpy>=1.8 in /usr/lib64/python2.7/site-packages (from visdom->-r requirements.txt (line 4)) (1.14.3)
Collecting pillow (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/00/49/a0483e7308b4b04b5a898789911dbb876d9fea54e7df0453915e47744cfd/Pillow-5.1.0-cp27-cp27mu-manylinux1_x86_64.whl
Collecting pyzmq (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/5d/b0/3aea046f5519e2e059a225e8c924f897846b608793f890be987d07858b7c/pyzmq-17.0.0-cp27-cp27mu-manylinux1_x86_64.whl
Collecting websocket-client (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/8a/a1/72ef9aa26cfe1a75cee09fc1957e4723add9de098c15719416a1ee89386b/websocket_client-0.48.0-py2.py3-none-any.whl
Collecting requests (from visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/49/df/50aa1999ab9bde74656c2919d9c0c085fd2b3775fd3eca826012bef76d8c/requests-2.18.4-py2.py3-none-any.whl
Requirement already satisfied: cython in /usr/lib64/python2.7/site-packages (from word2vec->-r requirements.txt (line 6)) (0.28.3)
Requirement already satisfied: pyyaml in /usr/lib64/python2.7/site-packages (from torch->torchnet==0.0.2->-r requirements.txt (line 5)) (3.12)
Collecting pathlib2; python_version == "2.7" or python_version == "3.3" (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/66/a7/9f8d84f31728d78beade9b1271ccbfb290c41c1e4dc13dbd4997ad594dcd/pathlib2-2.3.2-py2.py3-none-any.whl
Requirement already satisfied: backports.shutil-get-terminal-size; python_version == "2.7" in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"
->ipdb->-r requirements.txt (line 1)) (1.0.0)Collecting simplegeneric>0.8 (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
Requirement already satisfied: pygments in /usr/lib64/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1)) (2.2.0)
Requirement already satisfied: pexpect; sys_platform != "win32" in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt
 (line 1)) (4.6.0)Requirement already satisfied: prompt-toolkit<2.0.0,>=1.0.4 in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (li
ne 1)) (1.0.15)Collecting pickleshare (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/9f/17/daa142fc9be6b76f26f24eeeb9a138940671490b91cb5587393f297c8317/pickleshare-0.7.4-py2.py3-none-any.whl
Requirement already satisfied: decorator in /usr/lib/python2.7/site-packages (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1)) (3.4.0)
Collecting traitlets>=4.2 (from ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/93/d6/abcb22de61d78e2fc3959c964628a5771e47e7cc60d53e9342e21ed6cc9a/traitlets-4.3.2-py2.py3-none-any.whl
Requirement already satisfied: futures in /usr/lib/python2.7/site-packages (from tornado->visdom->-r requirements.txt (line 4)) (3.2.0)
Collecting singledispatch (from tornado->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/c5/10/369f50bcd4621b263927b0a1519987a04383d4a98fb10438042ad410cf88/singledispatch-3.4.0.3-py2.py3-none-any.whl
Collecting backports-abc>=0.4 (from tornado->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/7d/56/6f3ac1b816d0cd8994e83d0c4e55bc64567532f7dc543378bd87f81cebc7/backports_abc-0.5-py2.py3-none-any.whl
Collecting urllib3<1.23,>=1.21.1 (from requests->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/63/cb/6965947c13a94236f6d4b8223e21beb4d576dc72e8130bd7880f600839b8/urllib3-1.22-py2.py3-none-any.whl
Collecting idna<2.7,>=2.5 (from requests->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/27/cc/6dd9a3869f15c2edfab863b992838277279ce92663d334df9ecf5106f5c6/idna-2.6-py2.py3-none-any.whl
Collecting chardet<3.1.0,>=3.0.2 (from requests->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl
Collecting certifi>=2017.4.17 (from requests->visdom->-r requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/7c/e6/92ad559b7192d846975fc916b65f667c7b8c3a32bea7372340bfe9a15fa5/certifi-2018.4.16-py2.py3-none-any.whl
Collecting scandir; python_version < "3.5" (from pathlib2; python_version == "2.7" or python_version == "3.3"->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirement
s.txt (line 1))  Using cached https://files.pythonhosted.org/packages/13/bb/e541b74230bbf7a20a3949a2ee6631be299378a784f5445aa5d0047c192b/scandir-1.7.tar.gz
Requirement already satisfied: ptyprocess>=0.5 in /usr/lib/python2.7/site-packages (from pexpect; sys_platform != "win32"->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r
 requirements.txt (line 1)) (0.5.2)Requirement already satisfied: wcwidth in /usr/lib/python2.7/site-packages (from prompt-toolkit<2.0.0,>=1.0.4->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirement
s.txt (line 1)) (0.1.7)Requirement already satisfied: enum34; python_version == "2.7" in /usr/lib/python2.7/site-packages (from traitlets>=4.2->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r r
equirements.txt (line 1)) (1.1.6)Collecting ipython-genutils (from traitlets>=4.2->ipython<6.0.0,>=5.0.0; python_version == "2.7"->ipdb->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/fa/bc/9bd3b5c2b4774d5f33b2d544f1460be9df7df2fe42f352135381c347c69a/ipython_genutils-0.2.0-py2.py3-none-any.whl
Building wheels for collected packages: word2vec, torchnet, scandir
  Running setup.py bdist_wheel for word2vec ... done
  Stored in directory: /root/.cache/pip/wheels/89/a1/cb/417bcc7143a3e2befcc82da185ce8ad4a340eb82c0bf48969c
  Running setup.py bdist_wheel for torchnet ... done
  Stored in directory: /tmp/pip-ephem-wheel-cache-oQlzp4/wheels/17/05/ec/d05d051a225871af52bf504f5e8daf57704811b3c1850d0012
  Running setup.py bdist_wheel for scandir ... done
  Stored in directory: /root/.cache/pip/wheels/4a/ca/d7/26c3620234732f2d5b3ca86d7ccb0f59a21bd7712bffbbedc2
Successfully built word2vec torchnet scandir
Installing collected packages: scandir, pathlib2, simplegeneric, pickleshare, ipython-genutils, traitlets, ipython, ipdb, fire, tqdm, torchfile, singledispatch, backports-abc, tor
nado, scipy, pillow, pyzmq, websocket-client, urllib3, idna, chardet, certifi, requests, visdom, word2vec, torchnetSuccessfully installed backports-abc-0.5 certifi-2018.4.16 chardet-3.0.4 fire-0.1.3 idna-2.6 ipdb-0.11 ipython-5.7.0 ipython-genutils-0.2.0 pathlib2-2.3.2 pickleshare-0.7.4 pillow
-5.1.0 pyzmq-17.0.0 requests-2.18.4 scandir-1.7 scipy-1.1.0 simplegeneric-0.8.1 singledispatch-3.4.0.3 torchfile-0.1.0 torchnet-0.0.2 tornado-5.0.2 tqdm-4.23.4 traitlets-4.3.2 urllib3-1.22 visdom-0.1.8.3 websocket-client-0.48.0 word2vec-0.9.2[[email protected] PyTorchText-master]# 

安装完上述依赖之后,启动可视化工具visdom 服务
```sh
python -m visdom.server
```

pytorch学习笔记(八):PytTorch可视化工具 visdom

至此,环境已经准备好了,接下来就要准备init的源码和数据文件了

[[email protected] PyTorchText-master]# ll *.txt
-rw-r--r--. 1 root root   29200241 6月   5 16:55 char_embedding.txt
-rw-r--r--. 1 root root  239862273 6月   5 16:53 question_eval_set.txt
-rw-r--r--. 1 root root  204459814 6月   5 16:52 question_topic_train_set.txt
-rw-r--r--. 1 root root 3317236306 6月   5 16:57 question_train_set.txt
-rw-r--r--. 1 root root         77 6月   5 11:45 requirements.txt
-rw-r--r--. 1 root root    1072551 6月   5 16:53 topic_info.txt
-rw-r--r--. 1 root root 1005008916 6月   5 16:55 word_embedding.txt
[[email protected] PyTorchText-master]# 


## 2. 数据预处理

###  2.1 词向量转成numpy数组


[[email protected] PyTorchText-master]# python scripts/data_process/embedding2matrix.py main char_embedding.txt char_embedding.npz 
[[email protected] PyTorchText-master]# ls
char_embedding.npz  data                main-all.1.py  models                  question_topic_train_set.txt  rep.py            test.3.py
char_embedding.txt  del                 main-all.py    notebooks               question_train_set.txt        requirements.txt  topic_info.txt
checkpoints         ??ɽ??init?????.pdf  main.py        ??ɽ??-??ʿ????????.pptx  readme.md                     scripts           utils
config.py           LICENSE             ˵??.md         question_eval_set.txt   readme-zh.md                  test.1.py         word_embedding.txt
[[email protected] PyTorchText-master]# python scripts/data_process/embedding2matrix.py main word_embedding.txt word_embedding.npz 
[[email protected] PyTorchText-master]# ls
char_embedding.npz  data                main-all.1.py  models                  question_topic_train_set.txt  rep.py            test.3.py           word_embedding.txt
char_embedding.txt  del                 main-all.py    notebooks               question_train_set.txt        requirements.txt  topic_info.txt
checkpoints         ??ɽ??init?????.pdf  main.py        ??ɽ??-??ʿ????????.pptx  readme.md                     scripts           utils
config.py           LICENSE             ˵??.md         question_eval_set.txt   readme-zh.md                  test.1.py         word_embedding.npz
### 2.2  问题转成numpy 数组

这一步很耗内存,请确保内存>32G,仅操作了小文件

[[email protected] PyTorchText-master]# python scripts/data_process/question2array.py main question_eval_set.txt test.npz
Traceback (most recent call last):
  File "scripts/data_process/question2array.py", line 85, in <module>
    fire.Fire()
  File "/usr/lib/python2.7/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "/usr/lib/python2.7/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "/usr/lib/python2.7/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "scripts/data_process/question2array.py", line 19, in main
    char2id = np.load('/mnt/7/zhihu/ieee_zhihu_cup/data/char_embedding.npz')['word2id'].item()
  File "/usr/lib64/python2.7/site-packages/numpy/lib/npyio.py", line 372, in load
    fid = open(file, "rb")
IOError: [Errno 2] No such file or directory: '/mnt/7/zhihu/ieee_zhihu_cup/data/char_embedding.npz'
[[email protected] PyTorchText-master]# 

报错,需要修改文件中的路径,

[[email protected] PyTorchText-master]# python scripts/data_process/question2array.py main question_eval_set.txt test.npz
217360it [00:34, 6317.30it/s]
a
b
c
d
[[email protected] PyTorchText-master]# 

### 2.3 处理label,转成json

[[email protected] PyTorchText-master]# python scripts/data_process/label2id.py main question_topic_train_set.txt labels.json
> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(17)main()
     16     import ipdb;ipdb.set_trace()
---> 17     all_labels = { _ for ii,jj in results for _ in jj }
     18     sorted_labels = sorted(all_labels)

ipdb> n
> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(18)main()
     17     all_labels = { _ for ii,jj in results for _ in jj }
---> 18     sorted_labels = sorted(all_labels)
     19     label2id = {l_:ii for ii,l_ in enumerate(sorted_labels)}#-3239204820424->1

ipdb> n
> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(19)main()
     18     sorted_labels = sorted(all_labels)
---> 19     label2id = {l_:ii for ii,l_ in enumerate(sorted_labels)}#-3239204820424->1
     20     id2label = {ii:l_ for ii,l_ in enumerate(sorted_labels)}

ipdb> n
> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(20)main()
     19     label2id = {l_:ii for ii,l_ in enumerate(sorted_labels)}#-3239204820424->1
---> 20     id2label = {ii:l_ for ii,l_ in enumerate(sorted_labels)}
     21 

ipdb> n
> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(22)main()
     21 
---> 22     d = {ii:[label2id[jj] for jj in labels ]  for ii,labels in results}
     23 

ipdb> n
n> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(24)main()
     23 
---> 24     data = dict(d=d,label2id=label2id,id2label=id2label)
     25     import json

ipdb> n
> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(25)main()
     24     data = dict(d=d,label2id=label2id,id2label=id2label)
---> 25     import json
     26     with open(outfile,'w') as f:

ipdb> n
> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(26)main()
     25     import json
---> 26     with open(outfile,'w') as f:
     27         json.dump(data,f)

ipdb> n
> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(27)main()
     26     with open(outfile,'w') as f:
---> 27         json.dump(data,f)
     28 

ipdb> n




--Return--
None
> /weblogic/PyTorchText-master/scripts/data_process/label2id.py(27)main()
     26     with open(outfile,'w') as f:
---> 27         json.dump(data,f)
     28 

ipdb> 
> /usr/lib/python2.7/site-packages/fire/core.py(543)_CallCallable()
    542   result = fn(*varargs, **kwargs)
--> 543   return result, consumed_args, remaining_args, capacity
    544 

ipdb> c
[1]+  已杀死               python scripts/data_process/label2id.py main question_topic_train_set.txt labels.json
[[email protected] PyTorchText-master]# 

操作文档中说很耗内存的一步,也操作完成了,我的内存是2G。未找到train.npz,可能是因为内存原因失败了。

[[email protected] PyTorchText-master]# python scripts/data_process/question2array.py main question_train_set.txt train.npz
已杀死
[[email protected] PyTorchText-master]# 

接下来从训练集中抽取一部分的数据生成验证集, 这部分代码是从ipython中备份的,__注意修改代码中的数据存放路径__ .

[[email protected] PyTorchText-master]# python scripts/data_process/get_val.py 
[[email protected] PyTorchText-master]# 

## 3. 训练模型

我发现了致命的错误

[[email protected] PyTorchText-master]#  python main.py main --max_epoch=5 --plot_every=100 --env='MultiCNNText' --weight=1 --model='MultiCNNTextBNDeep'  --batch-size=64  --lr=0.001 
--lr2=0.000 --lr_decay=0.8 --decay_every=10000  --title-dim=250 --content-dim=250    --weight-decay=0 --type_='word' --debug-file='/tmp/debug'  --linear-hidden-size=2000 --zhuge=True  --augument=FalseTraceback (most recent call last):
  File "main.py", line 158, in <module>
    fire.Fire()  
  File "/usr/lib/python2.7/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "/usr/lib/python2.7/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "/usr/lib/python2.7/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "main.py", line 74, in main
    model = getattr(models,opt.model)(opt).cuda()
  File "/usr/lib64/python2.7/site-packages/torch/nn/modules/module.py", line 147, in cuda
    return self._apply(lambda t: t.cuda(device_id))
  File "/usr/lib64/python2.7/site-packages/torch/nn/modules/module.py", line 118, in _apply
    module._apply(fn)
  File "/usr/lib64/python2.7/site-packages/torch/nn/modules/module.py", line 124, in _apply
    param.data = fn(param.data)
  File "/usr/lib64/python2.7/site-packages/torch/nn/modules/module.py", line 147, in <lambda>
    return self._apply(lambda t: t.cuda(device_id))
  File "/usr/lib64/python2.7/site-packages/torch/_utils.py", line 65, in _cuda
    return new_type(self.size()).copy_(self, async)
  File "/usr/lib64/python2.7/site-packages/torch/cuda/__init__.py", line 272, in __new__
    _lazy_init()
  File "/usr/lib64/python2.7/site-packages/torch/cuda/__init__.py", line 84, in _lazy_init
    _check_driver()
  File "/usr/lib64/python2.7/site-packages/torch/cuda/__init__.py", line 58, in _check_driver
    http://www.nvidia.com/Download/index.aspx""")
AssertionError: 
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx
[[email protected] PyTorchText-master]#