How can I extract the first paragraph from a Wikipedia article, using Python?
For example, for Albert Einstein
, that would be:
Albert Einstein (pronounced /ˈælbərt ˈaɪnstaɪn/; German: [ˈalbɐt ˈaɪnʃtaɪn] ( listen); 14 March 1879 – 18 April 1955) was a theoretical physicist, philosopher and author who is widely regarded as one of the most influential and iconic scientists and intellectuals of all time. A German-Swiss Nobel laureate, Einstein is often regarded as the father of modern physics. He received the 1921 Nobel Prize in Physics "for his services to theoretical physics, and especially for his discovery of the law of the photoelectric effect".
Some time ago I made two classes for get Wikipedia articles in plain text. I know that they aren’t the best solution, but you can adapt it to your needs:
You can use it like this:
from wikipedia import Wikipedia from wiki2plain import Wiki2Plain lang = 'simple' wiki = Wikipedia(lang) try: raw = wiki.article('Uruguay') except: raw = None if raw: wiki2plain = Wiki2Plain(raw) content = wiki2plain.text