Exploitation de ressources linguistiques en ligne

 

1) Création d’un corpus

 

http://www.bmj.com/

 

Research

 

http://www.bmj.com/research

 

influenza vaccination

 

Relevance / Most recent

 

Enregistrement au format Texte

 

2) Téléchargement du concordancier Monconc

 

perso.univ-lyon2.fr/~maniezf/c.zip

 

Word list

Alt QCF

 

Stop list

 

Simple search

Ctrl S

 

Collocate Search

stud?%%

Ctrl F

 

*demic*

*-*ed

Alt Q A

 

 

List of expressions

Alt Q A

Search Term

 

?????* ?????*

?????* ?????* ?????*

 

Adjectival suffixes :

 

????*ic ????*

 

antigenic shift

 

Linguee

http://www.linguee.fr/francais-anglais/search?source=anglais&query=antigenic+shift

mutation antigénique

cassure antigénique

 

Termium

http://www.termiumplus.gc.ca/

cassure antigénique

 

 

????*al ????*

????*ive ????*

????*ar ????*

 

3) Utilisation d’un corpus étiqueté et lemmatisé

 

Google : corpus CRTT

 

http://perso.univ-lyon2.fr/~maniezf/Corpus/Corpus_medical_FR_CRTT.htm

 

http://perso.univ-lyon2.fr/~maniezf/MEDFR.zip contient la totalité du corpus

 

AnnCardAng_txt.rar         version étiquetée et lemmatisée

 

File

Tag settings

Part of Speech tags

 

Collect Tag Information

 

Recherche de mots appartenant à une catégorie grammaticale donnée :

 

*_*_V*

 

*_présenter_V*

 

Ctrl F

 

*_diagnostic_N*

 

Ctrl F

 

*_évoquer_V*

 

 

4) Utilisation du Corpus of Contemporary American English

 

 

http://corpus2.byu.edu/coca/

 

A) Recherche d’un mot ou d’une expression

 

dissertation

LIST

CHART

KWIC

 

B) Comparaison de mots ou d’expressions

 

COMPARE

dissertation  thesis

 

SORT BY

FREQUENCY

RELEVANCE

 

doctoral dissertation is 5 times more prevalent than doctoral thesis

 

COLLOCATES          *        1        0

 

The comparison shows that doctoral and unpublished are the only adjectives that qualify dissertation.

 

 

committee   commission

 

 

COLLOCATES          *al      1        0

 

COLLOCATES          [jj]     1        0

 

 

strange                  odd

 

COLLOCATES          [nn]    0        1

 

 

efficient vs. effective

 

COLLOCATES          [nn]    0        1

 

Sections : ACAD:Medicine

Minimum Frequency : 1     1

 

efficacious, potent

 

COLLOCATES          [nn]    1        0

 

reaction vs. response

 

COLLOCATES          [nn]    0        1

 

cell vs. cellular

 

biologic vs. biological

 

pathologic vs. pathological

 

 

 

Recherche d’un groupe nominal entier :

 

RESET

LIST

Sections : ACAD:Medicine

 

 

*ic.[jj*] [nn*]

 

*al.[jj*] [nn*]

 

 

 

 

 

 

 

 

LIST

CHART

 

C) Variations diachroniques et diastratiques

 

facebook

twitter

hypothesize

 

 

D) Comparaison de structures syntaxiques

 

Effacement de la conjonction THAT introduisant une proposition complétive.

 

CLAIM THAT pronoun verb

 

ACADEMIC

[claim].[v*] that [p*] [v*] 574

[claim].[v*] [p*] [v*]        385

 

SPOKEN

[claim].[v*] that [p*] [v*] 624

[claim].[v*] [p*] [v*]        1427

 

 

KNOW THAT THE noun verb

 

ACADEMIC

[know] that the [n*] [v*]   304

[know] the [n*] [v*]                   257

 

 

SPOKEN

[know] that the [n*] [v*]  758

[know] the [n*] [v*] 1182

 

 

CONTEND THAT THE noun

 

 

ACADEMIC

[contend] that the [nn]     271

[contend] the [nn]           18

 

SPOKEN

[contend] that the [nn]     17

[contend] the [nn]           28

 

 

E) Utilisation des parties du discours

 

List of part of speech tags

You can also use part of speech tags is by selecting them from the drop-down list (click on [POS LIST] to show it).

 

 

 

Syntax

Meaning

Examples (Click to run)

Sample matches

[pos]
[pos*]

Part of speech (exact)
Part of speech (wildcard)

[vvg]
[v*]
 

going, using
find, does, keeping, started

 

[lemma]

Lemmas (all forms of a word)

[sing]
[tall]

sing, singing, sang
tall, taller, tallest

[=word]

Synonyms
[=weak]
 

low, tired, soft, vulnerable, etc.

 

 

word|word

Any of these words

stunning|gorgeous|charming

stunning, charming, gorgeous

*xx
x?xx
x?xx*

Wildcard: * = any # letters
Wildcard: ? = one letter

un*ly
s?ng
s?ng*

unlikely, unusually
sing, sang, song
song, singer, songbirds

-word

NOT (followed by PoS, lemma, word, etc. Most useful for "multiple slot" queries; see below)

-[nn*]

the, in, is

Combinations of preceding (samples)

You can limit to a particular part of speech by adding a period (full stop) and then the part of speech tag in brackets.

word.[pos]

Exact word and part of speech

strike.[v*]

strike (only as a verb)

word*.[pos]

Substring and part of speech

dis*.[vvd]

discovered, disappeared, discussed

[lemma].[pos]

Lemma and part of speech

[strike].[v*]

strike, struck, striking

[word].[pos]

Synonym and part of speech

[=beat].[v*]

hit, strike, defeat
(but not nouns, like rhythm or drumming)

You can add "lemma" to any other type of search, such as synonym or customized list, to see all forms of the matching words. Just use an extra set of brackets.

[[=word]]

Synonym and lemma

[[=publish]]

announced, circulating, publishes, issue
(no part of speech specified, so some noun uses)

You can also choose lemma and part of speech by combining the preceding symbols

[[=word]].[pos]

Synonym and lemma and part of speech

[[=clean]].[v*]

mop, scrubs, polishing

Multiple "slots" : Create sequences of words, using any of the preceding query types. Note that in each case, there is a space between the word "slots" in the query. These are just a few examples, from an unlimited number of combinations. Note on advanced queries involving variable length between words.

nooks and crannies

nooks and crannies

fast|quick|rapid [nn*]

fast food
rapid transit

pretty -[nn*]

pretty smart
pretty as
(but not pretty girl, pretty picture, etc)

[get] her to [v*]

get her to stay
got her to sleep

.|,|;  nevertheless [p*] [v*]
(Notice that punctuation can be used like any "word";
just make sure that it is separated from words by a space)

. Nevertheless it is
; nevertheless he said

[break] the [nn*]

break the law
broke the story

[[beat]].[v*] * [nn*]

beat the Yankees
beaten to death

[=gorgeous] [nn*]

beautiful woman
attractive wife

 

 

 

 

 

5) Utilisation d’un corpus aligné dans Access

 

http://perso.univ-lyon2.fr/~maniezf/Bird_Flu.txt

 

Importation des données

 

Occurrences de :

spread

 

Comme "*monitor*"

Pas Comme "*contrôl*"

Pas Comme "*surveill*"

 

 

 

 

6) Utilisation d’un corpus multilingue aligné

 

Google : Opus Tiedemann

http://opus.lingfil.uu.se/

 

EMEA - European Medicines Agency documents (EMEA0.3.tar.gz - 5.0 GB)

 

Statistics and TMX/Moses Downloads

Intersection des colonnes en et fr

 

Google : alinea kraif

Téléchargement d’Alinea

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Texte