A gentle introduction to Python for linguists
V4Py Summer School
David Lukeš
June 24th–28th, 2019
Introductions
About me
David Lukeš
david.lukes@ff.cuni.cz
https://dlukes.github.io
these slides at:
https://trnka.korpus.cz/~lukes/slides/v4py/intro
Institute of the Czech National Corpus
, Faculty of Arts, Charles University, Prague
academic interests: phonetics, corpus linguistics; language evolution and acquisition
started programming at uni, so it’s never too late :)
probably
the most useful skill
since learning to read
Python gives you wings!
Credits:
Randall Munroe, XKCD,
https://xkcd.com/353/
About you
Who has programmed before? In what language(s)? Python?
What’s your academic field? Linguistics, history, digital humanities…?
Who is reasonably familiar with working with language data on a computer (e.g. corpora etc.)?
Who knows what regular expressions are? Who uses them?
What are you hoping to learn this week?
About the course
Python:
https://www.python.org/
a simple, fun and approachable programming language
FLOSS (Free, Libre, Open-Source Software) × e.g. Microsoft Word
created in 1991 by Guido van Rossum
why is it named Python?
Using Python
Python 2 vs.
Python 3
if you want to install it on your own computer:
the Anaconda Distribution
but we’ll be using
https://jupyter.korpus.cz
NLTK Book:
http://www.nltk.org/book/
The NLP pipeline
What we’ll cover
Python basics (functions, control flow, collections)
The NLTK
package
&
book
as a good starting point for people interested in language data
How text is represented inside computers
Regular expressions in Python
Accessing web services (“REST APIs”) from Python & Automatic annotation of language data (tagging, parsing) – both courtesy of
Rudolf Rosa
Getting data into Python (raw text & tabular data)
Some visualizations (dispersion plots, wordclouds)
Case studies: collocation strength, keyword analysis
Hackathon on Friday!