WHAT IS THE PAROLE CORPUS?
AVAILABILITY
USAGE RESTRICTIONS
AN IMPRESSION OF THE APPLICATION POSSIBILITIES:
General structure of the user interface
Selecting a subcorpus
Searching via the Simple Search Screen
Searching via the Advanced Search Screen
Computation of collocations
The PAROLE corpus is a collection of modern Dutch texts amounting to c. 20 million tokens, for the greater part originating from newspaper or magazine articles. The texts are annotated for typographical and text-structural features. Each form has been automatically assigned a detailed part-of-speech code and a lemma. All encoding is TEI conformant (see http://www.tei-c.org/). Annotated text offers more advanced retrieval facilities than non-annotated texts (see the application possibilities).
For detailed information see Corpus Documentation.
Like the other three corpora that the INL made available via Internet (see http://www.inl.nl/corp/corp.htm), the PAROLE corpus is accessible free of charge through a retrieval system (see, however, the usage restrictions). The corpus is primarily meant for researchers of morphological, lexicological and - to a lesser extent - syntactical aspects of contemporary usage of the Dutch language, and for all teachers in the field of corpus linguistics. However, experience with the other INL corpora has proven that a much wider scala of researchers use the INL corpora for their research. Interested non-academic 'language enthousiasts' have turned out to be an important target group as well.
The PAROLE corpus has a number of usage restrictions.
Due to agreements made with the copyright owners of the texts, the PAROLE corpus must be consulted for no other purposes than one's own, non-commercial research purposes or teaching; see Copyright.
Restricted size of the output
As a result of the legal preconditions, the users of this corpus do not have access to large downloadable text fragments or complete texts (often needed by, for example, language technologists). The end result of a query is one or several words in a context of not more than 50 words to the left en 50 words to the right ("concordance"); in an update of this retrieval system it will also be possible to search for collocations. However, a maximum of 1000 concordances can be sent to the user's e-mail address, for further processing.
Technical requirements
Finally, there are a number of technical requirements. You can only get access to the corpus if you have a pc with the Windows operating system and Microsoft Internet Explorer 5.5 or 6.0 which has been Javascript enabled. Although there are no specific hardware requirements, a pc with at least a Pentium II processor and a 128 Mbyte internal memory is recommended.
Anyone impeded by any of these restrictions, will not be able to make proper use of this retrieval system.
AN IMPRESSION OF THE APPLICATION POSSIBILITIES
For detailed information about the following, please refer to the User Manual. If you are interested in the differences with the older INL corpora which are accessible over the Internet, please refer to the Corpus documentation.
General structure of the user interface

The user interface has a layered structure with several screens:
See the User Manual for detailed information.

You can apply your query to a selection of texts from the complete corpus. Parameters for such subcorpus selection are: author, title, medium, topic and period; you can combine parameters with the aid of the operators 'and', 'or' en 'not'. Finally, you can select at the level of separate texts. The selection criteria (search terms) of a subcorpus can be saved, and therefore used again, skipping the selection process. A selected subcorpus can also be saved as a default setting.
See the User Manual for detailed information.

In this search screen you can search for one or more terms, provided they are of the same category: either 'form', or 'lemma', or 'part of speech' (PoS), or 'TEI tag', or 'pattern' (predefined, simple or advanced query). To compose a query you could use 'wildcards' and the operators 'and, 'or' and 'not'. For more complex queries, which involve the combination of items belonging to different categories, you need to use the Advanced Search Screen.
Apart from intermediate results, such as wordlists with frequency, the result of a query will eventually provide a series of "concordances", i.e. the items searched for in a context of not more than 50 words on the left and 50 on the right. There are several facilities for sorting the search and/or narrowing it down (for example, by filtering out undesired output). A concordance can be shown with or without its encoding. Of each concordance a larger quotation can be retrieved, with or without encoding.
See the User Manual for detailed information.

Unlike the Simple Search Screen this search screen provides the possibility to combine the different categories ('form', 'lemma', 'part of speech' (PoS), 'TEI tag', and 'pattern') in your query. Other additional facilities:
The result is essentially the same as the end result of the queries submitted with the Simple Search Screen.

See the User Manual for detailed information.
The Collocation screen gives you the opportunity to search for statistic collocations: word pairs that occur significantly often in each other's proximity. The user can search for collocations of a search term (either as form or as lemma). The search scope can be narrowed down in several ways, for example to a certain part of speech or by indicating a specific direction or distance. There are also several options for the statistic criterion to be used for the computation of the significance of the collocation.