常用python爬虫库介绍与简要说明

(编辑:jimmy 日期: 2024/9/24 浏览:2)

这个列表包含与网页抓取和数据处理的Python库

python网络库

通用

urllib"https://github.com/kennethreitz/requests" rel="external nofollow" >requests"https://github.com/lorien/grab" rel="external nofollow" >grab"https://github.com/pycurl/pycurl" rel="external nofollow" >pycurl"http://curl.haxx.se/libcurl/" rel="external nofollow" >libcurl)。

urllib3"https://github.com/jcgregorio/httplib2" rel="external nofollow" >httplib2"https://github.com/jmcarp/robobrowser" rel="external nofollow" >RoboBrowser"https://github.com/hickford/MechanicalSoup" rel="external nofollow" >MechanicalSoup"https://github.com/jjlee/mechanize" rel="external nofollow" >mechanize"https://docs.python.org/3/library/socket.html" rel="external nofollow" >socket"https://github.com/Mashape/unirest-python" rel="external nofollow" >Unirest for Python"https://github.com/Lukasa/hyper" rel="external nofollow" >hyper"https://github.com/Anorov/PySocks" rel="external nofollow" >PySocks"https://github.com/dreid/treq" rel="external nofollow" >treq"https://github.com/KeepSafe/aiohttp" rel="external nofollow" >aiohttp"http://docs.grablib.org/en/latest/#grab-spider-user-manual" rel="external nofollow" >grab"http://scrapy.org/" rel="external nofollow" >scrapy"https://github.com/binux/pyspider" rel="external nofollow" >pyspider"https://github.com/chineking/cola" rel="external nofollow" >cola"https://github.com/scrapinghub/portia" rel="external nofollow" >portia"https://github.com/benoitc/restkit" rel="external nofollow" >restkit"https://github.com/matiasb/demiurge" rel="external nofollow" >demiurge"http://lxml.de/" rel="external nofollow" >lxml"https://pythonhosted.org/cssselect" rel="external nofollow" >cssselect"http://pythonhosted.org/pyquery/" rel="external nofollow" >pyquery"http://www.crummy.com/software/BeautifulSoup/bs4/doc/" rel="external nofollow" >BeautifulSoup"http://html5lib.readthedocs.org/en/latest/" rel="external nofollow" >html5lib"http://pythonhosted.org/feedparser/" rel="external nofollow" rel="external nofollow" >feedparser"https://github.com/mitsuhiko/markupsafe" rel="external nofollow" >MarkupSafe"https://github.com/martinblech/xmltodict" rel="external nofollow" >xmltodict"https://github.com/chrisglass/xhtml2pdf" rel="external nofollow" >xhtml2pdf"https://github.com/stchris/untangle" rel="external nofollow" >untangle"http://bleach.readthedocs.org/en/latest/" rel="external nofollow" >Bleach"https://github.com/Alir3z4/sanitize" rel="external nofollow" >sanitize"https://docs.python.org/2/library/difflib.html" rel="external nofollow" >difflib"https://github.com/ztane/python-Levenshtein/" rel="external nofollow" >Levenshtein"https://github.com/seatgeek/fuzzywuzzy" rel="external nofollow" >fuzzywuzzy"https://code.google.com/p/esmre/" rel="external nofollow" >esmre"https://github.com/LuminosoInsight/python-ftfy" rel="external nofollow" >ftfy"https://pypi.python.org/pypi/Unidecode" rel="external nofollow" >unidecode"https://github.com/moskytw/uniout" rel="external nofollow" >uniout"https://github.com/chardet/chardet" rel="external nofollow" >chardet"https://github.com/lxneng/xpinyin" rel="external nofollow" >xpinyin"https://github.com/vinta/pangu.py" rel="external nofollow" >pangu.py"https://github.com/dimka665/awesome-slugify" rel="external nofollow" >awesome-slugify"https://github.com/un33k/python-slugify" rel="external nofollow" >python-slugify"https://github.com/mozilla/unicode-slugify" rel="external nofollow" >unicode-slugify"https://github.com/j2a/pytils" rel="external nofollow" >pytils"http://www.dabeaz.com/ply/" rel="external nofollow" >PLY"http://pyparsing.wikispaces.com/" rel="external nofollow" >pyparsing"https://github.com/derek73/python-nameparser" rel="external nofollow" >python-nameparser"https://github.com/daviddrysdale/python-phonenumbers" rel="external nofollow" >phonenumbers"https://github.com/selwin/python-user-agents" rel="external nofollow" >python-user-agents"https://github.com/shon/httpagentparser" rel="external nofollow" >HTTP Agent Parser"https://github.com/kennethreitz/tablib" rel="external nofollow" >tablib"https://github.com/deanmalmgren/textract" rel="external nofollow" >textract"https://github.com/okfn/messytables" rel="external nofollow" >messytables"https://github.com/turicas/rows" rel="external nofollow" >rows"https://github.com/python-openxml/python-docx" rel="external nofollow" >python-docx"https://github.com/python-excel/xlwt" rel="external nofollow" >xlwt"https://github.com/python-excel/xlrd" rel="external nofollow" >xlrd"https://xlsxwriter.readthedocs.org/" rel="external nofollow" >XlsxWriter"http://xlwings.org/" rel="external nofollow" >xlwings"https://openpyxl.readthedocs.org/en/latest/" rel="external nofollow" >openpyxl"https://github.com/brianray/mm" rel="external nofollow" >Marmir"https://github.com/euske/pdfminer" rel="external nofollow" >PDFMiner"https://github.com/mstamy2/PyPDF2" rel="external nofollow" >PyPDF2"http://www.reportlab.com/opensource/" rel="external nofollow" >ReportLab"https://pypi.python.org/pypi/pdftables" rel="external nofollow" >pdftables"https://github.com/waylan/Python-Markdown" rel="external nofollow" >Python-Markdown"https://github.com/lepture/mistune" rel="external nofollow" >Mistune"https://pypi.python.org/pypi/markdown2" rel="external nofollow" >markdown2"http://pyyaml.org/" rel="external nofollow" >PyYAML"https://pypi.python.org/pypi/cssutils/" rel="external nofollow" >cssutils"http://pythonhosted.org/feedparser/" rel="external nofollow" rel="external nofollow" >feedparser"https://sqlparse.readthedocs.org/" rel="external nofollow" >sqlparse"https://github.com/benoitc/http-parser" rel="external nofollow" >http-parser"https://github.com/erikriver/opengraph" rel="external nofollow" >opengraph"https://github.com/erocarrera/pefile" rel="external nofollow" >pefile"https://github.com/kmike/psd-tools" rel="external nofollow" >psd-tools"http://www.nltk.org/" rel="external nofollow" >NLTK"http://www.clips.ua.ac.be/pattern" rel="external nofollow" >Pattern"http://textblob.readthedocs.org/" rel="external nofollow" >TextBlob"https://github.com/fxsjy/jieba" rel="external nofollow" >jieba"https://github.com/isnowfy/snownlp" rel="external nofollow" >SnowNLP"https://github.com/victorlin/loso" rel="external nofollow" >loso"https://github.com/duanhongyi/genius" rel="external nofollow" >genius"https://github.com/saffsd/langid.py" rel="external nofollow" >langid.py"https://korean.readthedocs.org/" rel="external nofollow" >Korean"https://github.com/kmike/pymorphy2" rel="external nofollow" >pymorphy2"https://github.com/NAMD/pypln.backend" rel="external nofollow" >PyPLN"http://selenium.googlecode.com/git/docs/api/py/api.html" rel="external nofollow" >selenium"http://carrerasrodrigo.github.io/Ghost.py/" rel="external nofollow" >Ghost.py"https://github.com/makinacorpus/spynner" rel="external nofollow" >Spynner"https://github.com/cobrateam/splinter" rel="external nofollow" >Splinter"http://docs.python.org/2.7/library/threading.html" rel="external nofollow" >threading"http://docs.python.org/2.7/library/multiprocessing.html" rel="external nofollow" >multiprocessing"http://www.celeryproject.org/" rel="external nofollow" rel="external nofollow" >celery"https://docs.python.org/3/library/concurrent.futures.html" rel="external nofollow" >concurrent-futures"https://docs.python.org/3/library/asyncio.html" rel="external nofollow" >asyncio"https://twistedmatrix.com/trac/" rel="external nofollow" >Twisted"http://www.tornadoweb.org/" rel="external nofollow" >Tornado"https://github.com/quantmind/pulsar" rel="external nofollow" >pulsar"https://github.com/jamwt/diesel" rel="external nofollow" >diesel"http://www.gevent.org/" rel="external nofollow" >gevent"https://github.com/python-greenlet/greenlet" rel="external nofollow" >greenlet"http://eventlet.net/" rel="external nofollow" >eventlet"https://github.com/madisonmay/Tomorrow" rel="external nofollow" >Tomorrow"http://www.celeryproject.org/" rel="external nofollow" rel="external nofollow" >celery"https://github.com/coleifer/huey" rel="external nofollow" >huey"https://github.com/pricingassistant/mrq" rel="external nofollow" >mrq"http://python-rq.org/docs/" rel="external nofollow" >RQ"https://github.com/rdegges/simpleq" rel="external nofollow" >simpleq"https://github.com/Yelp/python-gearman" rel="external nofollow" >python-gearman"http://docs.picloud.com/" rel="external nofollow" >picloud"http://www.dominoup.com/" rel="external nofollow" >dominoup.com"https://github.com/mailgun/flanker" rel="external nofollow" >flanker"https://github.com/mailgun/talon" rel="external nofollow" >Talon"https://github.com/gruns/furl" rel="external nofollow" >furl"https://github.com/codeinthehole/purl" rel="external nofollow" >purl"https://docs.python.org/3/library/urllib.parse.html" rel="external nofollow" >urllib.parse"https://github.com/john-kurkowski/tldextract" rel="external nofollow" >tldextract"https://github.com/drkjam/netaddr" rel="external nofollow" >netaddr"https://github.com/codelucas/newspaper" rel="external nofollow" >newspaper"https://github.com/Alir3z4/html2text" rel="external nofollow" >html2text"https://github.com/grangier/python-goose" rel="external nofollow" >python-goose"https://github.com/michaelhelmick/lassie" rel="external nofollow" >lassie"https://github.com/coleifer/micawber" rel="external nofollow" >micawber"https://github.com/miso-belica/sumy" rel="external nofollow" >sumy"https://github.com/vinta/Haul" rel="external nofollow" >Haul"https://github.com/buriy/python-readability" rel="external nofollow" >python-readability"https://github.com/scrapy/scrapely" rel="external nofollow" >scrapely"http://rg3.github.io/youtube-dl/" rel="external nofollow" >youtube-dl"http://www.soimort.org/you-get/" rel="external nofollow" >you-get"https://github.com/WikiTeam/wikiteam" rel="external nofollow" >WikiTeam"https://github.com/crossbario/crossbar/" rel="external nofollow" >Crossbar"https://github.com/tavendo/AutobahnPython" rel="external nofollow" >AutobahnPython"https://github.com/Lawouach/WebSocket-for-Python" rel="external nofollow" >WebSocket-for-Python"https://github.com/samarudge/dnsyo" rel="external nofollow" >dnsyo"https://github.com/saghul/pycares" rel="external nofollow" >pycares"https://github.com/Itseez/opencv" rel="external nofollow" >OpenCV"https://github.com/sightmachine/SimpleCV" rel="external nofollow" >SimpleCV"https://github.com/luispedro/mahotas" rel="external nofollow" >mahotas"https://github.com/shadowsocks/shadowsocks" rel="external nofollow" >shadowsocks"https://github.com/benoitc/tproxy" rel="external nofollow" >tproxy"https://github.com/vinta/awesome-python" rel="external nofollow" >awesome-python

pycrumbs

python-github-projects

python_reference

pythonidae

更多Python常用库请点击下面的相关链接

一句话新闻

一文看懂荣耀MagicBook Pro 16
荣耀猎人回归!七大亮点看懂不只是轻薄本,更是游戏本的MagicBook Pro 16.
人们对于笔记本电脑有一个固有印象:要么轻薄但性能一般,要么性能强劲但笨重臃肿。然而,今年荣耀新推出的MagicBook Pro 16刷新了人们的认知——发布会上,荣耀宣布猎人游戏本正式回归,称其继承了荣耀 HUNTER 基因,并自信地为其打出“轻薄本,更是游戏本”的口号。
众所周知,寻求轻薄本的用户普遍更看重便携性、外观造型、静谧性和打字办公等用机体验,而寻求游戏本的用户则普遍更看重硬件配置、性能释放等硬核指标。把两个看似难以相干的产品融合到一起,我们不禁对它产生了强烈的好奇:作为代表荣耀猎人游戏本的跨界新物种,它究竟做了哪些平衡以兼顾不同人群的各类需求呢?