ConceptualMetaphor Posted March 26, 2009 Posted March 26, 2009 Ok, I've been diligently avoiding posting in this thread - I have a thesis to write, after all! - but I'm afraid you guys have managed to suck me in. I know that I posted earlier saying that I considered myself a functional linguist, but I'm honestly of the opinion that at this point in the field, labels like these are unnecessarily divisive and too hard to define to even be really all that useful. Considering that a lot of research is moving in an interdisciplinary direction, and given that linguistics is already so fractious and open-ended, maybe we should stop trying divide everyone into one of two massive groups? Haha!! See, I would have thought you'd put cog on the formal side of the fence! Them lines are blurry. And meanwhile, I'm going into cognitive linguistics, so proverbial gun to my head I'd put it on the functional side as well. But that has a lot to do with the fact that I'm approaching it from a cog sci perspective, which trends toward the function camp, and because I'm grounded in discourse analysis (I'm actually a linguistic anthropology major for my undergrad). And here's what dragged me into this - sorry, Dinali - This clears it up for me. I think the problem is the use of the word "possible." When I said possible sentence, I meant "actually possible, as in, someone might say it tomorrow but no one's around with a notebook or DAT recorder to add it to the corpus;" not theoretically possible, but realistically possible. You have to admit that there's no such thing as a complete corpus, which means that there are things people have said that are not attested to in data. Hypothetically, here's a corpus: 1. I see a ship. 2. I saw a ship. 3. I see a dog 4. I saw a dog. 5. I see a cat. Is "I saw a cat" an allowed sentence? It's not attested to in the corpus. A formalist is going to take the existing data, construct a paradigm, and hypothesize that "I saw a cat" is possible. If there are native speakers alive and available, one can make an experiment. The experiment is as simple as asking "Hey, can you say 'I saw a cat?'" The native speaker renders his or her, yes, intution on the grammaticality of the phrase. And really, what is a corpus but a large collection of sentences that native speakers have intuitively judged grammatical (by saying them aloud)? I suspect you're being somewhat facetious here, because I'm not sure anyone (well, just about anyone...) honestly thinks that "exact phrase not in corpus" = "can't exist in the language, period!" A better characterization, using the same toy example you suggested, might be: 1. I[NP] [see[V] [a[DET] ship[N][NP]][VP]. 2. I saw a ship. 3. I see a dog 4. I saw a dog. 5. I see a cat. (With similar POS tags for all the other sentences in the corpus. And sorry if my brackets are off, I hate doing this by hand...) So is "I saw a cat" an allowed sentence? Well, we can see from the corpus that I is a NP, saw is a past participle, a is a determiner, and cat is a noun. And we see the attested forms "I saw a dog" and "I saw a ship", which both have the parsing: S -> NP VP VP -> PP NP NP-> DET N Therefore we can ascertain that "I saw a cat" would be a felicitous sentence in this language as well, because it can be accepted by that attested grammar. I think the great thing about corpus linguistics is that we can do this for far, far more utterances than can be intuited, or individually analyzed, or elicited from subjects. Automatic POS taggers are a wonderful thing! I don't see why corpora should be an issue for formalists, given that they can be such a useful tool: you can think of a structure you want to investigate, intuit that you find it felicitous, ask some subjects "hey, can you say this?" and then look in a corpus to see if it occurs in everyday speech and what tends to elicit it. And then it's easy to look through the corpus for similar occurrences. And it works well in reverse, too - you might think "oh, nobody ever says that" and your subjects might, in a formal research environment, say "oh, I wouldn't say that!" but corpora sometimes show that constructs are far more common than one might first suspect. Anyways, that's my two cents. In general I'm going to stay out of this, if only because I don't consider myself nearly informed enough to really even justify having an opinion on the matter. And I have other things to occupy myself with...like that thesis...
Dinali Posted March 26, 2009 Posted March 26, 2009 Facetious? Me? I resemble that comment! In any case, I'm going back to tone curves and that'll be that. All the best to all of you.
fuzzylogician Posted March 26, 2009 Posted March 26, 2009 I don't see why corpora should be an issue for formalists, given that they can be such a useful tool: you can think of a structure you want to investigate, intuit that you find it felicitous, ask some subjects "hey, can you say this?" and then look in a corpus to see if it occurs in everyday speech and what tends to elicit it. And then it's easy to look through the corpus for similar occurrences. And it works well in reverse, too - you might think "oh, nobody ever says that" and your subjects might, in a formal research environment, say "oh, I wouldn't say that!" but corpora sometimes show that constructs are far more common than one might first suspect. I don't think anyone here claimed that we shouldn't use corpora as a tool in our research. But: for the languages I work on, few, if any, corpora exist. The few that do (for some of the languages) don't contain enough data in the registers I am looking at (=they're usually of newspapers and suchlike, but I'm looking at informal speech). Often they're too small for me to infer anything about the frequency of use of an utterance, or the probability that it's unacceptable (just because it doesn't occur in my small sample). Given that, I sometimes have to revert back to making up examples and asking informants if they're acceptable. I try to base them on google searches of chats and fora which use informal speech, or on things people I eavesdropped on in the street or on buses said . Unfortunately, more so than not, corpora just don't give me conclusive information about what I'm looking for.
Nel Posted March 26, 2009 Author Posted March 26, 2009 Fuzzylogician: If informal speech is the register you're aiming for, you can try the British National Corpus (BNC), although I'm unsure of the size of its informal register, its total corpus is about 10 million words (or has it become a billion?), I'm pretty sure it's informal speech is also quite substantial. Also, when you say informal speech, do you mean monologues? Or are they interactive dialogues? In monologues, do you mean political speeches, academic lectures or public announcements etc.? But given that you mentioned 'informal', do you then mean dialogues as in everyday conversation? For everyday conversation you can try the Santa Barbara Corpus of Spoken English. Or do you do other languages? The availability of different kinds of corpus is usually much closer than most people think.
fuzzylogician Posted March 27, 2009 Posted March 27, 2009 I mean languages other than English. I've been at this for a while now, the field is very small where I'm at, and I personally know the people working on corpus linguistics here. I think I'd know if there was a corpus that suited my needs--I really wish there were, I spent a lot of time last summer doing google searches for data someone needed for their research which I couldn't find evidence of in our corpora (but abounded in chats). I probably could have done a month's work in a week if I had a tagged corpus *sighs*.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now