欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

Python3NLTK-自然语言

程序员文章站 2024-01-11 09:21:40
本文简单介绍了利用Python的NLTK库进行自然语言处理。 ......

NLTK

从NLTK中的book模块中,载入所有条目

  • book 模块包含所有数据
from nltk.book import *
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908
text1
<Text: Moby Dick by Herman Melville 1851>
text2
<Text: Sense and Sensibility by Jane Austen 1811>

搜索文本或主题

  1. concordance允许在课文中查找单词,并打印出来
  2. similar 用来识别文章中和搜索词相似的词语,可以用在搜索引擎中的相关度识别功能中。
  3. common_contexts 用来识别2个关键词相似的词语。
  4. dispersion_plot 绘制单词的离散图
text1.concordance('monstrous') # 在text1中查阅词汇'monstrous'
# concordance 
# 英 [kən'kɔːd(ə)ns]  美 [kən'kɔrdns]
# n. 调和,一致;用语索引;著作或作家全集的重要用字索引
Displaying 11 of 11 matches:
ong the former , one was of a most monstrous size . ... This came towards us , 
ON OF THE PSALMS . " Touching that monstrous bulk of the whale or ork we have r
ll over with a heathenish array of monstrous clubs and spears . Some were thick
d as you gazed , and wondered what monstrous cannibal and savage could ever hav
that has survived the flood ; most monstrous and most mountainous ! That Himmal
they might scout at Moby Dick as a monstrous fable , or still worse and more de
th of Radney .'" CHAPTER 55 Of the Monstrous Pictures of Whales . I shall ere l
ing Scenes . In connexion with the monstrous pictures of whales , I am strongly
ere to enter upon those still more monstrous stories of them which are to be fo
ght have been rummaged out of this monstrous cabinet there is no telling . But 
of Whale - Bones ; for Whales of a monstrous size are oftentimes cast up dead u
text2.concordance('affection')
Displaying 25 of 79 matches:
, however , and , as a mark of his affection for the three girls , he left them
t . It was very well known that no affection was ever supposed to exist between
deration of politeness or maternal affection on the side of the former , the tw
d the suspicion -- the hope of his affection for me may warrant , without impru
hich forbade the indulgence of his affection . She knew that his mother neither
rd she gave one with still greater affection . Though her late conversation wit
 can never hope to feel or inspire affection again , and if her home be uncomfo
m of the sense , elegance , mutual affection , and domestic comfort of the fami
, and which recommended him to her affection beyond every thing else . His soci
ween the parties might forward the affection of Mr . Willoughby , an equally st
 the most pointed assurance of her affection . Elinor could not be surprised at
he natural consequence of a strong affection in a young and ardent mind . This 
 opinion . But by an appeal to her affection for her mother , by representing t
 every alteration of a place which affection had established as perfect with hi
e will always have one claim of my affection , which no other can possibly shar
f the evening declared at once his affection and happiness . " Shall we see you
ause he took leave of us with less affection than his usual behaviour has shewn
ness ." " I want no proof of their affection ," said Elinor ; " but of their en
onths , without telling her of his affection ;-- that they should part without 
ould be the natural result of your affection for her . She used to be all unres
distinguished Elinor by no mark of affection . Marianne saw and listened with i
th no inclination for expense , no affection for strangers , no profession , an
till distinguished her by the same affection which once she had felt no doubt o
al of her confidence in Edward ' s affection , to the remembrance of every mark
 was made ? Had he never owned his affection to yourself ?" " Oh , no ; but if 
text1.similar('monstrous')
true contemptible christian abundant few part mean careful puzzled
mystifying passing curious loving wise doleful gamesome singular
delightfully perilous fearless
text2.similar('monstrous')
very so exceedingly heartily a as good great extremely remarkably
sweet vast amazingly
text2.common_contexts(['monstrous','very'])
a_pretty am_glad a_lucky is_pretty be_glad
# 从文本中检查一个单词的位置,从该单词出现开始出现了多少次。
# Each stripe represents an instance of a word, 
# and each row represents the entire text.
text4.dispersion_plot(['citizens','democracy','freedon','duties','America','liberty'])
# dispersion 
# 英 [dɪ'spɜːʃ(ə)n]  美 [dɪ'spɝʒn]
# n. 散布;[统计][数] 离差;驱散

Python3NLTK-自然语言

print(text3.generate('monstrous'))
None

统计词汇

len(text3)
44764
sorted(set(text3))
['!',
 "'",
 '(',
 ')',
 ',',
 ',)',
 '.',
 '.)',
 ':',
 ';',
 ';)',
 '?',
 '?)',
 'A',
 'Abel',
 'Abelmizraim',
 'Abidah',
 'Abide',
 'Abimael',
 'Abimelech',
 'Abr',
 'Abrah',
 'Abraham',
 'Abram',
 'Accad',
 'Achbor',
 'Adah',
 'Adam',
 'Adbeel',
 'Admah',
 'Adullamite',
 'After',
 'Aholibamah',
 'Ahuzzath',
 'Ajah',
 'Akan',
 'All',
 'Allonbachuth',
 'Almighty',
 'Almodad',
 'Also',
 'Alvah',
 'Alvan',
 'Am',
 'Amal',
 'Amalek',
 'Amalekites',
 'Ammon',
 'Amorite',
 'Amorites',
 'Amraphel',
 'An',
 'Anah',
 'Anamim',
 'And',
 'Aner',
 'Angel',
 'Appoint',
 'Aram',
 'Aran',
 'Ararat',
 'Arbah',
 'Ard',
 'Are',
 'Areli',
 'Arioch',
 'Arise',
 'Arkite',
 'Arodi',
 'Arphaxad',
 'Art',
 'Arvadite',
 'As',
 'Asenath',
 'Ashbel',
 'Asher',
 'Ashkenaz',
 'Ashteroth',
 'Ask',
 'Asshur',
 'Asshurim',
 'Assyr',
 'Assyria',
 'At',
 'Atad',
 'Avith',
 'Baalhanan',
 'Babel',
 'Bashemath',
 'Be',
 'Because',
 'Becher',
 'Bedad',
 'Beeri',
 'Beerlahairoi',
 'Beersheba',
 'Behold',
 'Bela',
 'Belah',
 'Benam',
 'Benjamin',
 'Beno',
 'Beor',
 'Bera',
 'Bered',
 'Beriah',
 'Bethel',
 'Bethlehem',
 'Bethuel',
 'Beware',
 'Bilhah',
 'Bilhan',
 'Binding',
 'Birsha',
 'Bless',
 'Blessed',
 'Both',
 'Bow',
 'Bozrah',
 'Bring',
 'But',
 'Buz',
 'By',
 'Cain',
 'Cainan',
 'Calah',
 'Calneh',
 'Can',
 'Cana',
 'Canaan',
 'Canaanite',
 'Canaanites',
 'Canaanitish',
 'Caphtorim',
 'Carmi',
 'Casluhim',
 'Cast',
 'Cause',
 'Chaldees',
 'Chedorlaomer',
 'Cheran',
 'Cherubims',
 'Chesed',
 'Chezib',
 'Come',
 'Cursed',
 'Cush',
 'Damascus',
 'Dan',
 'Day',
 'Deborah',
 'Dedan',
 'Deliver',
 'Diklah',
 'Din',
 'Dinah',
 'Dinhabah',
 'Discern',
 'Dishan',
 'Dishon',
 'Do',
 'Dodanim',
 'Dothan',
 'Drink',
 'Duke',
 'Dumah',
 'Earth',
 'Ebal',
 'Eber',
 'Edar',
 'Eden',
 'Edom',
 'Edomites',
 'Egy',
 'Egypt',
 'Egyptia',
 'Egyptian',
 'Egyptians',
 'Ehi',
 'Elah',
 'Elam',
 'Elbethel',
 'Eldaah',
 'EleloheIsrael',
 'Eliezer',
 'Eliphaz',
 'Elishah',
 'Ellasar',
 'Elon',
 'Elparan',
 'Emins',
 'En',
 'Enmishpat',
 'Eno',
 'Enoch',
 'Enos',
 'Ephah',
 'Epher',
 'Ephra',
 'Ephraim',
 'Ephrath',
 'Ephron',
 'Er',
 'Erech',
 'Eri',
 'Es',
 'Esau',
 'Escape',
 'Esek',
 'Eshban',
 'Eshcol',
 'Ethiopia',
 'Euphrat',
 'Euphrates',
 'Eve',
 'Even',
 'Every',
 'Except',
 'Ezbon',
 'Ezer',
 'Fear',
 'Feed',
 'Fifteen',
 'Fill',
 'For',
 'Forasmuch',
 'Forgive',
 'From',
 'Fulfil',
 'G',
 'Gad',
 'Gaham',
 'Galeed',
 'Gatam',
 'Gather',
 'Gaza',
 'Gentiles',
 'Gera',
 'Gerar',
 'Gershon',
 'Get',
 'Gether',
 'Gihon',
 'Gilead',
 'Girgashites',
 'Girgasite',
 'Give',
 'Go',
 'God',
 'Gomer',
 'Gomorrah',
 'Goshen',
 'Guni',
 'Hadad',
 'Hadar',
 'Hadoram',
 'Hagar',
 'Haggi',
 'Hai',
 'Ham',
 'Hamathite',
 'Hamor',
 'Hamul',
 'Hanoch',
 'Happy',
 'Haran',
 'Hast',
 'Haste',
 'Have',
 'Havilah',
 'Hazarmaveth',
 'Hazezontamar',
 'Hazo',
 'He',
 'Hear',
 'Heaven',
 'Heber',
 'Hebrew',
 'Hebrews',
 'Hebron',
 'Hemam',
 'Hemdan',
 'Here',
 'Hereby',
 'Heth',
 'Hezron',
 'Hiddekel',
 'Hinder',
 'Hirah',
 'His',
 'Hitti',
 'Hittite',
 'Hittites',
 'Hivite',
 'Hobah',
 'Hori',
 'Horite',
 'Horites',
 'How',
 'Hul',
 'Huppim',
 'Husham',
 'Hushim',
 'Huz',
 'I',
 'If',
 'In',
 'Irad',
 'Iram',
 'Is',
 'Isa',
 'Isaac',
 'Iscah',
 'Ishbak',
 'Ishmael',
 'Ishmeelites',
 'Ishuah',
 'Isra',
 'Israel',
 'Issachar',
 'Isui',
 'It',
 'Ithran',
 'Jaalam',
 'Jabal',
 'Jabbok',
 'Jac',
 'Jachin',
 'Jacob',
 'Jahleel',
 'Jahzeel',
 'Jamin',
 'Japhe',
 'Japheth',
 'Jared',
 'Javan',
 'Jebusite',
 'Jebusites',
 'Jegarsahadutha',
 'Jehovahjireh',
 'Jemuel',
 'Jerah',
 'Jetheth',
 'Jetur',
 'Jeush',
 'Jezer',
 'Jidlaph',
 'Jimnah',
 'Job',
 'Jobab',
 'Jokshan',
 'Joktan',
 'Jordan',
 'Joseph',
 'Jubal',
 'Judah',
 'Judge',
 'Judith',
 'Kadesh',
 'Kadmonites',
 'Karnaim',
 'Kedar',
 'Kedemah',
 'Kemuel',
 'Kenaz',
 'Kenites',
 'Kenizzites',
 'Keturah',
 'Kiriathaim',
 'Kirjatharba',
 'Kittim',
 'Know',
 'Kohath',
 'Kor',
 'Korah',
 'LO',
 'LORD',
 'Laban',
 'Lahairoi',
 'Lamech',
 'Lasha',
 'Lay',
 'Leah',
 'Lehabim',
 'Lest',
 'Let',
 'Letushim',
 'Leummim',
 'Levi',
 'Lie',
 'Lift',
 'Lo',
 'Look',
 'Lot',
 'Lotan',
 'Lud',
 'Ludim',
 'Luz',
 'Maachah',
 'Machir',
 'Machpelah',
 'Madai',
 'Magdiel',
 'Magog',
 'Mahalaleel',
 'Mahalath',
 'Mahanaim',
 'Make',
 'Malchiel',
 'Male',
 'Mam',
 'Mamre',
 'Man',
 'Manahath',
 'Manass',
 'Manasseh',
 'Mash',
 'Masrekah',
 'Massa',
 'Matred',
 'Me',
 'Medan',
 'Mehetabel',
 'Mehujael',
 'Melchizedek',
 'Merari',
 'Mesha',
 'Meshech',
 'Mesopotamia',
 'Methusa',
 'Methusael',
 'Methuselah',
 'Mezahab',
 'Mibsam',
 'Mibzar',
 'Midian',
 'Midianites',
 'Milcah',
 'Mishma',
 'Mizpah',
 'Mizraim',
 'Mizz',
 'Moab',
 'Moabites',
 'Moreh',
 'Moreover',
 'Moriah',
 'Muppim',
 'My',
 'Naamah',
 'Naaman',
 'Nahath',
 'Nahor',
 'Naphish',
 'Naphtali',
 'Naphtuhim',
 'Nay',
 'Nebajoth',
 'Neither',
 'Night',
 'Nimrod',
 'Nineveh',
 'Noah',
 'Nod',
 'Not',
 'Now',
 'O',
 'Obal',
 'Of',
 'Oh',
 'Ohad',
 'Omar',
 'On',
 'Onam',
 'Onan',
 'Only',
 'Ophir',
 'Our',
 'Out',
 'Padan',
 'Padanaram',
 'Paran',
 'Pass',
 'Pathrusim',
 'Pau',
 'Peace',
 'Peleg',
 'Peniel',
 'Penuel',
 'Peradventure',
 'Perizzit',
 'Perizzite',
 'Perizzites',
 'Phallu',
 'Phara',
 'Pharaoh',
 'Pharez',
 'Phichol',
 'Philistim',
 'Philistines',
 'Phut',
 'Phuvah',
 'Pildash',
 'Pinon',
 'Pison',
 'Potiphar',
 'Potipherah',
 'Put',
 'Raamah',
 'Rachel',
 'Rameses',
 'Rebek',
 'Rebekah',
 'Rehoboth',
 'Remain',
 'Rephaims',
 'Resen',
 'Return',
 'Reu',
 'Reub',
 'Reuben',
 'Reuel',
 'Reumah',
 'Riphath',
 'Rosh',
 'Sabtah',
 'Sabtech',
 'Said',
 'Salah',
 'Salem',
 'Samlah',
 'Sarah',
 'Sarai',
 'Saul',
 'Save',
 'Say',
 'Se',
 'Seba',
 'See',
 'Seeing',
 'Seir',
 'Sell',
 'Send',
 'Sephar',
 'Serah',
 'Sered',
 'Serug',
 'Set',
 'Seth',
 'Shalem',
 'Shall',
 'Shalt',
 'Shammah',
 'Shaul',
 'Shaveh',
 'She',
 'Sheba',
 'Shebah',
 'Shechem',
 'Shed',
 'Shel',
 'Shelah',
 'Sheleph',
 'Shem',
 'Shemeber',
 'Shepho',
 'Shillem',
 'Shiloh',
 'Shimron',
 'Shinab',
 'Shinar',
 'Shobal',
 'Should',
 'Shuah',
 'Shuni',
 'Shur',
 'Sichem',
 'Siddim',
 'Sidon',
 'Simeon',
 'Sinite',
 'Sitnah',
 'Slay',
 'So',
 'Sod',
 'Sodom',
 'Sojourn',
 'Some',
 'Spake',
 'Speak',
 'Spirit',
 'Stand',
 'Succoth',
 'Surely',
 'Swear',
 'Syrian',
 'Take',
 'Tamar',
 'Tarshish',
 'Tebah',
 'Tell',
 'Tema',
 'Teman',
 'Temani',
 'Terah',
 'Thahash',
 'That',
 'The',
 'Then',
 'There',
 'Therefore',
 'These',
 'They',
 'Thirty',
 'This',
 'Thorns',
 'Thou',
 'Thus',
 'Thy',
 'Tidal',
 'Timna',
 'Timnah',
 'Timnath',
 'Tiras',
 'To',
 'Togarmah',
 'Tola',
 'Tubal',
 'Tubalcain',
 'Twelve',
 'Two',
 'Unstable',
 'Until',
 'Unto',
 'Up',
 'Upon',
 'Ur',
 'Uz',
 'Uzal',
 'We',
 'What',
 'When',
 'Whence',
 'Where',
 'Whereas',
 'Wherefore',
 'Which',
 'While',
 'Who',
 'Whose',
 'Whoso',
 'Why',
 'Wilt',
 'With',
 'Woman',
 'Ye',
 'Yea',
 'Yet',
 'Zaavan',
 'Zaphnathpaaneah',
 'Zar',
 'Zarah',
 'Zeboiim',
 'Zeboim',
 'Zebul',
 'Zebulun',
 'Zemarite',
 'Zepho',
 'Zerah',
 'Zibeon',
 'Zidon',
 'Zillah',
 'Zilpah',
 'Zimran',
 'Ziphion',
 'Zo',
 'Zoar',
 'Zohar',
 'Zuzims',
 'a',
 'abated',
 'abide',
 'able',
 'abode',
 'abomination',
 'about',
 'above',
 'abroad',
 'absent',
 'abundantly',
 'accept',
 'accepted',
 'according',
 'acknowledged',
 'activity',
 'add',
 'adder',
 'afar',
 'afflict',
 'affliction',
 'afraid',
 'after',
 'afterward',
 'afterwards',
 'aga',
 'again',
 'against',
 'age',
 'aileth',
 'air',
 'al',
 'alive',
 'all',
 'almon',
 'alo',
 'alone',
 'aloud',
 'also',
 'altar',
 'altogether',
 'always',
 'am',
 'among',
 'amongst',
 'an',
 'and',
 'angel',
 'angels',
 'anger',
 'angry',
 'anguish',
 'anointedst',
 'anoth',
 'another',
 'answer',
 'answered',
 'any',
 'anything',
 'appe',
 'appear',
 'appeared',
 'appease',
 'appoint',
 'appointed',
 'aprons',
 'archer',
 'archers',
 'are',
 'arise',
 'ark',
 'armed',
 'arms',
 'army',
 'arose',
 'arrayed',
 'art',
 'artificer',
 'as',
 'ascending',
 'ash',
 'ashamed',
 'ask',
 'asked',
 'asketh',
 'ass',
 'assembly',
 'asses',
 'assigned',
 'asswaged',
 'at',
 'attained',
 'audience',
 'avenged',
 'aw',
 'awaked',
 'away',
 'awoke',
 'back',
 'backward',
 'bad',
 'bade',
 'badest',
 'badne',
 'bak',
 'bake',
 'bakemeats',
 'baker',
 'bakers',
 'balm',
 'bands',
 'bank',
 'bare',
 'barr',
 'barren',
 'basket',
 'baskets',
 'battle',
 'bdellium',
 'be',
 'bear',
 'beari',
 'bearing',
 'beast',
 'beasts',
 'beautiful',
 'became',
 'because',
 'become',
 'bed',
 'been',
 'befall',
 'befell',
 'before',
 'began',
 'begat',
 'beget',
 'begettest',
 'begin',
 'beginning',
 'begotten',
 'beguiled',
 'beheld',
 'behind',
 'behold',
 'being',
 'believed',
 'belly',
 'belong',
 'beneath',
 'bereaved',
 'beside',
 'besides',
 'besought',
 'best',
 'betimes',
 'better',
 'between',
 'betwixt',
 'beyond',
 'binding',
 'bird',
 'birds',
 'birthday',
 'birthright',
 'biteth',
 'bitter',
 'blame',
 'blameless',
 'blasted',
 'bless',
 'blessed',
 'blesseth',
 'blessi',
 'blessing',
 'blessings',
 'blindness',
 'blood',
 'blossoms',
 'bodies',
 'boldly',
 'bondman',
 'bondmen',
 'bondwoman',
 'bone',
 'bones',
 'book',
 'booths',
 'border',
 'borders',
 'born',
 'bosom',
 'both',
 'bottle',
 'bou',
 'boug',
 'bough',
 'bought',
 'bound',
 'bow',
 'bowed',
 'bowels',
 'bowing',
 'boys',
 'bracelets',
 'branches',
 'brass',
 'bre',
 'breach',
 'bread',
 'breadth',
 'break',
 'breaketh',
 'breaking',
 'breasts',
 'breath',
 'breathed',
 'breed',
 'brethren',
 'brick',
 'brimstone',
 'bring',
 'brink',
 'broken',
 '*',
 'broth',
 'brother',
 'brought',
 'brown',
 'bruise',
 'budded',
 'build',
 'builded',
 'built',
 'bulls',
 'bundle',
 'bundles',
 'burdens',
 'buried',
 'burn',
 'burning',
 'burnt',
 'bury',
 'buryingplace',
 'business',
 'but',
 'butler',
 'butlers',
 'butlership',
 'butter',
 'buy',
 'by',
 'cakes',
 'calf',
 'call',
 'called',
 'came',
 'camel',
 'camels',
 'camest',
 'can',
 'cannot',
 'canst',
 'captain',
 'captive',
 'captives',
 'carcases',
 'carried',
 'carry',
 'cast',
 'castles',
 'catt',
 'cattle',
 'caught',
 'cause',
 'caused',
 'cave',
 'cease',
 'ceased',
 'certain',
 'certainly',
 'chain',
 'chamber',
 'change',
 'changed',
 'changes',
 'charge',
 'charged',
 'chariot',
 'chariots',
 'chesnut',
 'chi',
 'chief',
 'child',
 'childless',
 'childr',
 'children',
 'chode',
 'choice',
 'chose',
 'circumcis',
 'circumcise',
 'circumcised',
 'citi',
 'cities',
 'city',
 'clave',
 'clean',
 'clear',
 'cleave',
 'clo',
 'closed',
 'clothed',
 'clothes',
 'cloud',
 'clusters',
 'co',
 'coat',
 'coats',
 'coffin',
 'cold',
 ...]
len(set(text3))
2789
len(text3)/len(set(text3))
16.050197203298673
text3.count('smote')
5
100*text4.count('a')/len(text4)
1.4643016433938312
def lexical_diversity(text):
    # lexical英['leksɪk(ə)l] 美 ['lɛksɪkl]
    # adj.词汇的;[语] 词典的;词典编纂的
    # diversity英[daɪ'vɜːsɪtɪ; dɪ-]美 [dɪˈvəsɪti]
    # n.多样性;差异
    return len(text)/len(set(text))
def percentage(count, total):
    return 100*count/total

print('text3中词汇多样性指标:{}'.format(lexical_diversity(text3)))
print('text4中单词a占全文的百分比:{}'.format(percentage(text4.count('a'),len(text4))))
text3中词汇多样性指标:16.050197203298673
text4中单词a占全文的百分比:1.4643016433938312

列表 = Lists

sent1 = ['Call', 'me','Ishmael','.']
print('打印sent1中的内容:{}'.format(sent1))
print('打印sent1中内容的长度:{}'.format(len(sent1)))
print('sent1中词汇多样性指标:{}'.format(lexical_diversity(sent1)))
打印sent1中的内容:['Call', 'me', 'Ishmael', '.']
打印sent1中内容的长度:4
sent1中词汇多样性指标:1.0
sent1,sent2,sent3,sent4 # 这是内部定义好的列表
(['Call', 'me', 'Ishmael', '.'],
 ['The',
  'family',
  'of',
  'Dashwood',
  'had',
  'long',
  'been',
  'settled',
  'in',
  'Sussex',
  '.'],
 ['In',
  'the',
  'beginning',
  'God',
  'created',
  'the',
  'heaven',
  'and',
  'the',
  'earth',
  '.'],
 ['Fellow',
  '-',
  'Citizens',
  'of',
  'the',
  'Senate',
  'and',
  'of',
  'the',
  'House',
  'of',
  'Representatives',
  ':'])
sent4+sent1
['Fellow',
 '-',
 'Citizens',
 'of',
 'the',
 'Senate',
 'and',
 'of',
 'the',
 'House',
 'of',
 'Representatives',
 ':',
 'Call',
 'me',
 'Ishmael',
 '.']
sent1.append('Some')
['Call', 'me', 'Ishmael', '.', 'Some', 'Some', 'Some', 'Some']

列表索引

type(text4)
nltk.text.Text
text4[173]
'awaken'
text4.index('awaken')
173
text5[16715:16735]
['U86',
 'thats',
 'why',
 'something',
 'like',
 'gamefly',
 'is',
 'so',
 'good',
 'because',
 'you',
 'can',
 'actually',
 'play',
 'a',
 'full',
 'game',
 'without',
 'buying',
 'it']
text6[1600:1625]
['We',
 "'",
 're',
 'an',
 'anarcho',
 '-',
 'syndicalist',
 'commune',
 '.',
 'We',
 'take',
 'it',
 'in',
 'turns',
 'to',
 'act',
 'as',
 'a',
 'sort',
 'of',
 'executive',
 'officer',
 'for',
 'the',
 'week']

变量

sent1 = ['Call','me','Ishmael','.']
my_sent = ['Bravely','bold','Sir','Robin',',','rode','forth','from','Camelot','.']
noun_phrase = my_sent[1:4]
print('打印切片后的列表:noun_phrase-》{}'.format(noun_phrase))
wOrDs = sorted(noun_phrase)
print('打印排序后的列表:wOrDs-》{}'.format(wOrDs))
打印切片后的列表:noun_phrase-》['bold', 'Sir', 'Robin']
打印排序后的列表:wOrDs-》['Robin', 'Sir', 'bold']

字符串

name = 'bright'
print('打印name中的第一个字母:{}'.format(name[0]))
print(name[:4])
print(name*2)
print(name + '!')
打印name中的第一个字母:b
brig
brightbright
bright!
' '.join(['Monty', 'Python'])
'Monty Python'
'Monty Python'.split()
['Monty', 'Python']
saying = ['After','all','is','said','and','done','more','is','said','than','done']
tokens = set(saying)
tokens = sorted(tokens)
tokens[-2:]
['said', 'than']
fdist1 = FreqDist(text1)
vocabulary1 = fdist1.keys()
type(vocabulary1)
dict_keys
fdist1.plot(50, cumulative=True)
#Cumulative frequency plot for the 50 most frequently used words in Moby Dick, which
#account for nearly half of the tokens.

Python3NLTK-自然语言

fdist1.hapaxes() #the words that occur once only
['Herman',
 'Melville',
 ']',
 'ETYMOLOGY',
 'Late',
 'Consumptive',
 'School',
 'threadbare',
 'lexicons',
 'mockingly',
 'flags',
 'mortality',
 'signification',
 'HACKLUYT',
 'Sw',
 'HVAL',
 'roundness',
 'Dut',
 'Ger',
 'WALLEN',
 'WALW',
 'IAN',
 'RICHARDSON',
 'KETOS',
 'GREEK',
 'CETUS',
 'LATIN',
 'WHOEL',
 'ANGLO',
 'SAXON',
 'WAL',
 'HWAL',
 'SWEDISH',
 'ICELANDIC',
 'BALEINE',
 'BALLENA',
 'FEGEE',
 'ERROMANGOAN',
 'Librarian',
 'painstaking',
 'burrower',
 'grub',
 'Vaticans',
 'stalls',
 'higgledy',
 'piggledy',
 'gospel',
 'promiscuously',
 'commentator',
 'belongest',
 'sallow',
 'Pale',
 'Sherry',
 'loves',
 'bluntly',
 'Subs',
 'thankless',
 'Hampton',
 'Court',
 'hie',
 'refugees',
 'pampered',
 'Michael',
 'Raphael',
 'unsplinterable',
 'GENESIS',
 'JOB',
 'JONAH',
 'punish',
 'ISAIAH',
 'soever',
 'cometh',
 'incontinently',
 'perisheth',
 'PLUTARCH',
 'MORALS',
 'breedeth',
 'Whirlpooles',
 'Balaene',
 'arpens',
 'PLINY',
 'Scarcely',
 'TOOKE',
 'LUCIAN',
 'TRUE',
 'catched',
 'OCTHER',
 'VERBAL',
 'TAKEN',
 'MOUTH',
 'ALFRED',
 '890',
 'gudgeon',
 'retires',
 'MONTAIGNE',
 'APOLOGY',
 'RAIMOND',
 'SEBOND',
 'Nick',
 'RABELAIS',
 'cartloads',
 'STOWE',
 'ANNALS',
 'LORD',
 'BACON',
 'Touching',
 'ork',
 'DEATH',
 'sovereignest',
 'bruise',
 'HAMLET',
 'leach',
 'Mote',
 'availle',
 'returne',
 'againe',
 'worker',
 'Dinting',
 'paine',
 'thro',
 'maine',
 'FAERIE',
 'Immense',
 'til',
 'DAVENANT',
 'PREFACE',
 'GONDIBERT',
 'spermacetti',
 'Hosmannus',
 'Nescio',
 'VIDE',
 'Spencer',
 'Talus',
 'flail',
 'threatens',
 'jav',
 'lins',
 'WALLER',
 'SUMMER',
 'ISLANDS',
 'Commonwealth',
 'Civitas',
 'OPENING',
 'SENTENCE',
 'HOBBES',
 'LEVIATHAN',
 'Silly',
 'Mansoul',
 'chewing',
 'sprat',
 'PILGRIM',
 'PROGRESS',
 'Created',
 'PARADISE',
 'LOST',
 '---"',
 'Hugest',
 'Stretched',
 'Draws',
 'FULLLER',
 'PROFANE',
 'HOLY',
 'STATE',
 'DRYDEN',
 'ANNUS',
 'MIRABILIS',
 'aground',
 'EDGE',
 'TEN',
 'SPITZBERGEN',
 'PURCHAS',
 'wantonness',
 'fuzzing',
 'vents',
 'HERBERT',
 'INTO',
 'ASIA',
 'AFRICA',
 'SCHOUTEN',
 'SIXTH',
 'CIRCUMNAVIGATION',
 'Elbe',
 'ducat',
 'herrings',
 'GREENLAND',
 'Several',
 'Fife',
 'Anno',
 '1652',
 'Pitferren',
 'SIBBALD',
 'FIFE',
 'KINROSS',
 'Myself',
 'Sperma',
 'ceti',
 'fierceness',
 'RICHARD',
 'STRAFFORD',
 'LETTER',
 'BERMUDAS',
 'PHIL',
 'TRANS',
 '1668',
 'PRIMER',
 'COWLEY',
 '1729',
 '"...',
 'frequendy',
 'insupportable',
 'disorder',
 'ULLOA',
 'SOUTH',
 'AMERICA',
 'sylphs',
 'petticoat',
 'Oft',
 'Tho',
 'RAPE',
 'LOCK',
 'NAT',
 'wales',
 'JOHNSON',
 'COOK',
 'dung',
 'lime',
 'juniper',
 'UNO',
 'VON',
 'TROIL',
 'LETTERS',
 'BANKS',
 'SOLANDER',
 '1772',
 'Nantuckois',
 'JEFFERSON',
 'MEMORIAL',
 'MINISTER',
 'REFERENCE',
 'PARLIAMENT',
 'SOMEWHERE',
 'guarding',
 'protecting',
 'robbers',
 'BLACKSTONE',
 'Rodmond',
 'suspends',
 'attends',
 'FALCONER',
 'Bright',
 'roofs',
 'domes',
 'rockets',
 'Around',
 'unwieldy',
 'COWPER',
 'VISIT',
 'LONDON',
 'HUNTER',
 'DISSECTION',
 'SMALL',
 'SIZED',
 'aorta',
 'gushing',
 'PALEY',
 'THEOLOGY',
 'mammiferous',
 'hind',
 'BARON',
 'CUVIER',
 'COLNETT',
 'PURPOSE',
 'EXTENDING',
 'SPERMACETI',
 'Floundered',
 'chace',
 'peopling',
 'Gather',
 'Led',
 'instincts',
 'trackless',
 'Assaulted',
 'voracious',
 'spiral',
 'MONTGOMERY',
 'WORLD',
 'FLOOD',
 'Paean',
 'fatter',
 'Flounders',
 'CHARLES',
 'LAMB',
 'TRIUMPH',
 '1690',
 'OBED',
 'Susan',
 'HAWTHORNE',
 'TWICE',
 'bespeak',
 'raal',
 'COOPER',
 'PILOT',
 'Berlin',
 'Gazette',
 'ECKERMANN',
 'CONVERSATIONS',
 'GOETHE',
 'ESSEX',
 'WAS',
 'ATTACKED',
 'FINALLY',
 'DESTROYED',
 'OWEN',
 'CHACE',
 'FIRST',
 'SAID',
 'VESSEL',
 'YORK',
 '1821',
 'piping',
 'dimmed',
 'phospher',
 'ELIZABETH',
 'OAKES',
 'SMITH',
 'amounted',
 '440',
 'SCORESBY',
 'Mad',
 'agonies',
 'endures',
 'infuriated',
 'rears',
 'snaps',
 'propelled',
 'observers',
 'opportunities',
 'habitudes',
 'BEALE',
 'offensively',
 'artful',
 'mischievous',
 'FREDERICK',
 'DEBELL',
 '1840',
 'October',
 'Raise',
 'ay',
 'THAR',
 'bowes',
 'os',
 'ROSS',
 'ETCHINGS',
 'CRUIZE',
 '1846',
 'Globe',
 'transactions',
 'relate',
 'HUSSEY',
 'SURVIVORS',
 'parried',
 'MISSIONARY',
 'JOURNAL',
 'TYERMAN',
 'boldest',
 'persevering',
 'REPORT',
 'DANIEL',
 'SPEECH',
 'SENATE',
 'APPLICATION',
 'ERECTION',
 'BREAKWATER',
 'CAPTORS',
 'WHALEMAN',
 'ADVENTURES',
 'BIOGRAPHY',
 'GATHERED',
 'HOMEWARD',
 'COMMODORE',
 'PREBLE',
 'REV',
 'CHEEVER',
 'MUTINEER',
 'BROTHER',
 'ANOTHER',
 'MCCULLOCH',
 'COMMERCIAL',
 'reciprocal',
 'clews',
 'SOMETHING',
 'UNPUBLISHED',
 'CURRENTS',
 'Pedestrians',
 'recollect',
 'gateways',
 'VOYAGER',
 'ARCTIC',
 'NEWSPAPER',
 'TAKING',
 'RETAKING',
 'HOBOMACK',
 'MIRIAM',
 'FISHERMAN',
 'appliance',
 'RIBS',
 'TRUCKS',
 'Terra',
 'Del',
 'Fuego',
 'DARWIN',
 'NATURALIST',
 ";--'",
 '!\'"',
 'WHARTON',
 'Loomings',
 'spleen',
 'regulating',
 'circulation',
 'Whenever',
 'drizzly',
 'hypos',
 'philosophical',
 'Cato',
 'Manhattoes',
 'reefs',
 'downtown',
 'gazers',
 'Circumambulate',
 'Corlears',
 'Coenties',
 'Slip',
 'Whitehall',
 'Posted',
 'sentinels',
 'spiles',
 'pier',
 'lath',
 'counters',
 'desks',
 'loitering',
 'shady',
 'Inlanders',
 'lanes',
 'alleys',
 'attract',
 'dale',
 'dreamiest',
 'shadiest',
 'quietest',
 'enchanting',
 'Saco',
 'crucifix',
 'Deep',
 'mazy',
 'Tiger',
 'Tennessee',
 'Rockaway',
 'Persians',
 'deity',
 'Narcissus',
 'ungraspable',
 'hazy',
 'quarrelsome',
 'offices',
 'abominate',
 'toils',
 'trials',
 'barques',
 'schooners',
 'broiling',
 'buttered',
 'judgmatically',
 'peppered',
 'reverentially',
 'idolatrous',
 'dotings',
 'ibis',
 'roasted',
 'bake',
 'plumb',
 'Van',
 'Rensselaers',
 'Randolphs',
 'Hardicanutes',
 'lording',
 'tallest',
 'decoction',
 'Seneca',
 'Stoics',
 'Testament',
 'promptly',
 'rub',
 'infliction',
 'BEING',
 'PAID',
 'urbane',
 'ills',
 'monied',
 'consign',
 'prevalent',
 'violate',
 'Pythagorean',
 'commonalty',
 'police',
 'surveillance',
 'programme',
 'solo',
 'CONTESTED',
 'ELECTION',
 'PRESIDENCY',
 'UNITED',
 'STATES',
 'ISHMAEL',
 'BLOODY',
 'AFFGHANISTAN',
 'managers',
 'genteel',
 'comedies',
 'farces',
 'cunningly',
 'disguises',
 'cajoling',
 'unbiased',
 'freewill',
 'discriminating',
 'overwhelming',
 'undeliverable',
 'itch',
 'forbidden',
 'ignoring',
 'lodges',
 'Carpet',
 'Bag',
 'Manhatto',
 'candidates',
 'penalties',
 'Tyre',
 'Carthage',
 'imported',
 'cobblestones',
 'bitingly',
 'shouldering',
 'price',
 'fervent',
 'asphaltic',
 'pavement',
 'flinty',
 'projections',
 'soles',
 'Too',
 'cheapest',
 'cheeriest',
 'invitingly',
 'particles',
 'peer',
 'Angel',
 'Doom',
 'wailing',
 'gnashing',
 'Wretched',
 'entertainment',
 'Moving',
 'emigrant',
 'poverty',
 'creak',
 'lodgings',
 'zephyr',
 'hob',
 'toasting',
 'observest',
 'sashless',
 'glazier',
 'reasonest',
 'chinks',
 'crannies',
 'lint',
 'chattering',
 'shiverings',
 'cob',
 'redder',
 'Orion',
 'glitters',
 'conservatories',
 'president',
 'temperance',
 'blubbering',
 'straggling',
 'wainscots',
 'reminding',
 'oilpainting',
 'besmoked',
 'defaced',
 'unequal',
 'crosslights',
 'hags',
 'delineate',
 'bewitched',
 'ponderings',
 'boggy',
 'soggy',
 'squitchy',
 'froze',
 'heath',
 'icebound',
 'represents',
 'Horner',
 'foundered',
 'clubs',
 'harvesting',
 'hacking',
 'horrifying',
 'Mixed',
 'Nathan',
 'Swain',
 'corkscrew',
 'Blanco',
 'sojourning',
 'fireplaces',
 'duskier',
 'cockpits',
 'rarities',
 'Projecting',
 'Within',
 'shelves',
 'flasks',
 'bustles',
 'deliriums',
 'Abominable',
 'tumblers',
 'cylinders',
 'goggling',
 'deceitfully',
 'tapered',
 'Parallel',
 'pecked',
 'footpads',
 'Fill',
 'shilling',
 'examining',
 'SKRIMSHANDER',
 'accommodated',
 'unoccupied',
 'haint',
 'pose',
 'whalin',
 'decidedly',
 'objectionable',
 'wander',
 'Battery',
 'ruminating',
 'adorning',
 'potatoes',
 'sartainty',
 'diabolically',
 'steaks',
 'undress',
 'looker',
 'rioting',
 'Grampus',
 'seed',
 'Feegees',
 'tramping',
 'Enveloped',
 'bedarned',
 'eruption',
 'officiating',
 'brimmers',
 'complained',
 'potion',
 'colds',
 'catarrhs',
 'liquor',
 'arrantest',
 'topers',
 'obstreperously',
 'aloof',
 'desirous',
 'hilarity',
 'coffer',
 'Southerner',
 'mountaineers',
 'Alleghanian',
 'missed',
 'supernaturally',
 'congratulate',
 'multiply',
 'bachelor',
 'abominated',
 'tidiest',
 'bedwards',
 'shan',
 'tablecloth',
 'Skrimshander',
 'bump',
 'spraining',
 'eider',
 'yoking',
 'rickety',
 'whirlwinds',
 'knockings',
 'dismissed',
 'popped',
 'cherishing',
 'chuckled',
 'chuckle',
 'mightily',
 'catches',
 'bamboozingly',
 'overstocked',
 'toothpick',
 'rayther',
 'BROWN',
 'slanderin',
 'farrago',
 'BROKE',
 'Sartain',
 'Mt',
 'Hecla',
 'persist',
 'mystifying',
 'unsay',
 'criminal',
 'Wall',
 'purty',
 'sarmon',
 'rips',
 'tellin',
 'bought',
 'balmed',
 'curios',
 'sellin',
 'inions',
 'fooling',
 'idolators',
 'Depend',
 'reg',
 'lar',
 'spliced',
 'Johnny',
 'sprawling',
 'Arter',
 'glim',
 'jiffy',
 'irresolute',
 'vum',
 'WON',
 'Folding',
 'scrutiny',
 'porcupine',
 'moccasin',
 'ponchos',
 'parade',
 'rainy',
 'remembering',
 'commended',
 'cobs',
 'Nod',
 'footfall',
 'unlacing',
 'blackish',
 'plasters',
 'inkling',
 'Placing',
 'crammed',
 'scalp',
 'mildewed',
 'Ignorance',
 'parent',
 'nonplussed',
 'undressing',
 'checkered',
 'Thirty',
 'frogs',
 'quaked',
 'wrapall',
 'dreadnaught',
 'fumbled',
 'Remembering',
 'manikin',
 'tenpin',
 'andirons',
 'jambs',
 'bricks',
 'appropriate',
 'applying',
 'hastier',
 'withdrawals',
 'antics',
 'devotee',
 'extinguishing',
 'unceremoniously',
 'bagged',
 'sportsman',
 'woodcock',
 'uncomfortableness',
 'deliberating',
 'puffed',
 'sang',
 'Stammering',
 'conjured',
 'responses',
 'debel',
 'flourishing',
 'Angels',
 'flourishings',
 'peddlin',
 'sleepe',
 'grunted',
 'gettee',
 'motioning',
 'comely',
 'insured',
 'Counterpane',
 'parti',
 'triangles',
 'interminable',
 'caper',
 'supperless',
 '21st',
 'hemisphere',
 'sigh',
 'Sixteen',
 'ached',
 'coaches',
 'stockinged',
 'slippering',
 'misbehaviour',
 'unendurable',
 'stepmothers',
 'misfortunes',
 'steeped',
 'shudderingly',
 'confounding',
 'soberly',
 'recurred',
 'predicament',
 'unlock',
 'bridegroom',
 'clasp',
 'hugged',
 'rouse',
 'snore',
 'scratch',
 'Throwing',
 'expostulations',
 'unbecomingness',
 'matrimonial',
 'dawning',
 'overture',
 'innate',
 'compliment',
 'civility',
 'rudeness',
 'toilette',
 'dressing',
 'donning',
 'gaspings',
 'booting',
 'caterpillar',
 'outlandishness',
 'manners',
 'education',
 'undergraduate',
 'dreamt',
 'cowhide',
 'pinched',
 'curtains',
 'indecorous',
 'contented',
 'restricting',
 'donned',
 'lathering',
 'unsheathes',
 'whets',
 'Rogers',
 'cutlery',
 'Afterwards',
 'baton',
 'Breakfast',
 'pleasantly',
 'bountifully',
 'laughable',
 'bosky',
 'unshorn',
 'gowns',
 'toasted',
 'lingers',
 'tarried',
 'barred',
 'Grub',
 'Park',
 'assurance',
 'polish',
 'occasioned',
 'embarrassed',
 'bashfulness',
 'duelled',
 'winking',
 'tastes',
 'sheepishly',
 'bashful',
 'icicle',
 'admirer',
 'cordially',
 'grappling',
 'genteelly',
 'eschewed',
 'undivided',
 '6',
 'circulating',
 'nondescripts',
 'Chestnut',
 'jostle',
 'Regent',
 'Lascars',
 'Bombay',
 'Apollo',
 'Feegeeans',
 'Tongatobooarrs',
 'Erromanggoans',
 'Pannangians',
 'Brighggians',
 'weekly',
 'Vermonters',
 'stalwart',
 'frames',
 'felled',
 'strutting',
 'wester',
 'bombazine',
 'cloak',
 'mow',
 'gloves',
 'joins',
 'outfit',
 'waistcoats',
 'Hay',
 'Seed',
 'tract',
 'dearest',
 'pave',
 'eggs',
 'patrician',
 'parks',
 'scraggy',
 'scoria',
 'Herr',
 'dowers',
 'nieces',
 'reservoirs',
 'maples',
 'bountiful',
 'proffer',
 'passer',
 'cones',
 'blossoms',
 'superinduced',
 'carnation',
 'Salem',
 'sweethearts',
 'Puritanic',
 'Whaleman',
 'Wrapping',
 'Each',
 'quote',
 'TALBOT',
 'Near',
 'Desolation',
 '1st',
 'SISTER',
 'ROBERT',
 'WILLIS',
 'ELLERY',
 'NATHAN',
 'COLEMAN',
 'WALTER',
 'CANNY',
 'SETH',
 'GLEIG',
 'Forming',
 'ELIZA',
 '31st',
 'MARBLE',
 'SHIPMATES',
 'EZEKIEL',
 'HARDY',
 'AUGUST',
 '3d',
 '1833',
 'WIDOW',
 'Shaking',
 'glazed',
 'Affected',
 'relatives',
 'unhealing',
 'sympathetically',
 'wounds',
 'bleed',
 'blanks',
 ...]

单词的精细选择

  1. the set of all w such that w is an element of V (the vocabulary) and w has property P
    {w|w \(\in\) V and P(w)}
  2. The corresponding Python expression is given:
    [w for w in V if p(w)]
V = set(text1)
long_words = [w for w in V if len(w)>15]
sorted(long_words)
['CIRCUMNAVIGATION',
 'Physiognomically',
 'apprehensiveness',
 'cannibalistically',
 'characteristically',
 'circumnavigating',
 'circumnavigation',
 'circumnavigations',
 'comprehensiveness',
 'hermaphroditical',
 'indiscriminately',
 'indispensableness',
 'irresistibleness',
 'physiognomically',
 'preternaturalness',
 'responsibilities',
 'simultaneousness',
 'subterraneousness',
 'supernaturalness',
 'superstitiousness',
 'uncomfortableness',
 'uncompromisedness',
 'undiscriminating',
 'uninterpenetratingly']

本文选自《Natural Language Processing with Python》