Skip to main content
Wikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system called MediaWiki. Wikipedia is the largest and most-read reference work in history.
This notebook shows how to retrieve wiki pages from wikipedia.org into the Document format that is used downstream.

Integration details

Setup

To enable automated tracing of individual tools, set your LangSmith API key:
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"

Installation

The integration lives in the langchain-community package. We also need to install the wikipedia python package itself.
%pip install -qU langchain-community wikipedia

Instantiation

Now we can instantiate our retriever: WikipediaRetriever parameters include:
  • optional lang: default=“en”. Use it to search in a specific language part of Wikipedia
  • optional load_max_docs: default=100. Use it to limit number of downloaded documents. It takes time to download all 100 documents, so use a small number for experiments. There is a hard limit of 300 for now.
  • optional load_all_available_meta: default=False. By default only the most important fields downloaded: Published (date when document was published/last updated), title, Summary. If True, other fields also downloaded.
get_relevant_documents() has one argument, query: free text which used to find documents in Wikipedia
from langchain_community.retrievers import WikipediaRetriever

retriever = WikipediaRetriever()

Usage

docs = retriever.invoke("TOKYO GHOUL")
print(docs[0].page_content[:400])
Tokyo Ghoul (Japanese: 東京喰種(トーキョーグール), Hepburn: Tōkyō Gūru) is a Japanese dark fantasy manga series written and illustrated by Sui Ishida. It was serialized in Shueisha's seinen manga magazine Weekly Young Jump from September 2011 to September 2014, with its chapters collected in 14 tankōbon volumes. The story is set in an alternate version of Tokyo where humans coexist with ghouls, beings who loo

API reference

For detailed documentation of all WikipediaRetriever features and configurations head to the API reference.
I