Archive for the ‘HLT’ Category

First of all we have to know that Machine Learning is part of Artificial Intelligence; as Tom Mitchel defined in his “Machine Learning” book “Machine Learning is the study of algorithms that allow computer programs to automatically improve through experience“. Machine Learning focuses most of the times on the study of Computational Complexity of the problems.

Machine Learning is applied in several areas, such as machine translation, automatic summarization or question-answering systems, and it is a good alternative to the manually built resources, since it can be improved at a lower cost and the guarantees are better. But linguistics may be in danger, for at this time more and more subtle specialist-reserved mathematical device are used.

In data analysis there are some systems that don’t need human intuition, but other systems are conceived so that the machine interacts with the expert. Nevertheless, human intuition is something that will always be needed, for the designer of the system is the one who decides and specifies the way information is represented and manipulated.

Artificial Intelligence has been created as the reflection of Natural Intelligence; intelligent behavior means that not always the reaction to a situation will be the same, what’s more, one of the qualities of intelligence is that behavior has not been programmed, but a computer only carries out something that has previously been programmed.

The algorithms that allow computers learn are classified based on the desired outcome of each algorithm, and Computational Learning Theory (a branch of Theoretical computer science) is responsible of its analysis.

The aim with Machine Learning is to make our life easier by doing programs that learn by themselves while they get experienced with the human, and are able to do common activities in a fast and effective way.

Read Full Post »

On the third questionnaire, one of the subjects we have seen is machine translation. We have been asked to write a short Curriculum Vitae in Spanish and then translate it with three different online-translators (Google Translator, Lucy Translator and Reverso Translator). The results were more or less satisfying, but there were some big mistakes on the Spanish-English translation. Here is an example:

Entre mis aficiones, además de los idiomas, se encuentra la música. Estudié solfeo durante 8 años en el Conservatorio Municipal de Música Bartolomé Ercilla de Durango […]
Among my interests, besides the languages, he|she finds the music. I studied sol-fa|solfeggio for 8 years in the Municipal Conservatoire|Conservatory of Music Bartolomé Ercilla de Durango […]

What’s more, in one online translator we are warned and told that an automatic translation will never have the same quality of a translation done by a person (and the translation will be worse if the language is colloquial). Nevertheless, it is useful and you don’t spend so much time.

Machine Translation is a sub-field of computational linguistics and it is the application of computers to translate a text from one natural language to another. What basic MT does is to substitute words from one natural language to another, but more complex translations use corpus techniques and pay attention to the linguistic typology and translate idioms, among other things.

Users can interact with some translators and make the translations less ambiguous, for some of those systems give the user the opportunity to say which words are names. What others translators offer is a list of suggestions, the user chooses the one which best fits with what he was searching for and if none of the possibilities is what he looks for, he does some changes until he gets what he wants. After the TransType project, the results showed that with this way of translating users didn’t spend so much time an effort.

To sum up, we should add something that Ana Fernández Guerra and Francisco Fernández wrote in the book “Machine Translation, Capabilities and Limitations“. We could make some statements in the activity of translating:

  1. The possibility of translation: we are supposed to reproduce with total exactness every single piece of text or linguistic structure in other language we would find it difficult.
  2. Realize that we don’t translate from one language as a system to another language as a system, but from one text into another text.
  3. We should be cautious about some dogmatic statements.
  4. In the content (or message) of the text we must consider: meaning, designation and sense.

Read Full Post »

As Jim Cowie and Yorick Wilks said in one article, “Information Extraction (IE) is the name given to any process which selectively structures and combines data which is found, explicitly stated or implied, in one or more texts”. We have to add that Information Extraction is a technology based on analyzing Natural Language, and when the fact about a topic is taken from a document, it is automatically entered into a datasabe. Computational Linguistic techniques play an important role on IE, because IE, in a way, is interested in the structure of the text, unlike IR, which understands texts as “bags of words”.

When the user enters a word or sentence, he only gets the specific information he is interested in (after a process of text analysis). So, instead of documents, which is what Information retrieval offers, we get just the information we need. That information has been probably taken from a collection of documents, but it has been summarized.

IE is getting more and more important, for the amount information available on the internet grows everyday. People can get to that information more easily thanks to marking-up the data with XML tags, among other things. And not only “people” turns to IE, but also groups use it to summarize medical documents or build medical and biomedical ontologies.

These are the most common subtasks on IE:

  • Named Entity Recognition: recognition of entity names (for people and organizations), place names, temporal expressions, and certain types of numerical expressions.
  • Coreference: identification chains of noun phrases that refer to the same object. For example, anaphora is a type of coreference.
  • Terminology extraction: finding the relevant terms for a given corpus
  • Relationship Extraction: identification of relations between entities, such as:

It hasn’t reached the market yet, but it could become a great helper to industries of all kinds (this is an example from Yorick Wilks and Jim Cowie “finance companies want to know facts of the following sort and on a large scale: what company take-overs happened in a given time span; they want widely scattered text information reduced to a simple data base”).

Read Full Post »

Speech Recognition is a branch of Artificial Intelligence that enables spoken communication between human and computer, but there are some difficulties in the attempt of getting a more or less acceptable interpretation of the message, because the coopertation between information from different sources (such as the acoustic, phonetic, semantic or pragmatic) is ambiguous and some mistakes are unavoidable in the process.

Nearly all the Speech synthesizers use libraries of speech sound. The creation of this dicctionarie is important, because it is important to recognize the word user uses. To make the recognition easier, here is a recognition of vowels and recognition of consonants, and also a noise masking (some movile phones, for example, can work when we “talk” to them, and if we are on the street, there must be something that makes the sound clear). But even if the system has these advantages, mistakes may not be avoided. Most speech recognition algorithms rely only on the sound of the individual words, and not on their context, so they don’t understand speech, but recognize words. Here is an example of what could happen:

The child wore a spider ring on Halloween.

He was an American spy during the war.

The sound of “spider ring” and “spy during” is exactly the same. We hear the correct words depending on the context, and is something that we do unconsciously.

There are many ways of application of this system, but I think that the fact people with disabilities benefit from it is the most interesting. Some of them are unable to use their hands, others are deaf and use deaf telephony (voicemail to text, realy services or captioned telephone), and others have learning disabilities. There’s no doubt that our life will be easier in some years’ time when this systems get better.

Read Full Post »

When we search for information on the net we can obtain it from different places and in different ways, for there is loads of available data on the internet.

Question answering, also known as QA, is a way of getting that information; this system should be able to answer our questions (done in natural language), searching in pre-structured database or documents written in natural language.

As Dell Zhang and Wee Sun Lee wrote in one article “it is important for an online question answering system to be practical, because it is time-consuming to download and analyze the original web documents”. A question answering system is another information retrieval system, but what QA systems do is supply just the information we need, not a list of possibilities as searching engines usually do. To obtain the answers, the QA systems combine some NLP techniques, because the answer depends on the type of question.

And as I have told, depending on the question, the methods used to find the answers are different. There are two methods: shallow and deep. The first one finds fragments of documents, filters the information based on the presence of the answer required, and then the answers are ordered based on different criteria, such as word order. If the way the question is formulated is not enough (or, for example, some of the questions based are classified with an incorrect type), the second method is used. “More sophisticated syntactic, semantic and contextual processing must be performed to extract or construct the answer”.

So, there have been many advances on this kind of information retrieval systems, but dealing with Natural Language with computers is quite difficult, and it can be hard to get the data we are looking for using that kind of language with systems that have to improve a lot.

Read Full Post »

Here is the list of 10 research topics in major sites on Human Language Technologies I have chosen:

  1. Machine Translation
  2. Question answering systems
  3. Machine Learning in NLP
  4. Development of linguistic resources and tools
  5. Reconocimiento y síntesis de voz (Speech Recognition and Synthesis)
  6. Intelligent systems for natural language interaction
  7. Information retrieval, question answering, and information extraction
  8. Monolingual and multilingual text generation
  9. Lexical semantics and word sense disambiguation
  10. Human factors in MT and user interfaces

I’ll write one article for each topic that I have put in bold.

Topics taken from:

Read Full Post »

There are some important researchers on the field of Human Language Technologies (HLT). One of those researchers is Martin Kay. As he says, his main interests are translation (by people and machines), and computational linguistic algorithms, specially in the fields of morphology and syntax. He is well known for his work in computational linguistics; what’s more, he started to work in one of the earliest centres of Computational Linguistics research: the Cambridge Language Research Unit. He is nowadays Professor of Linguistics at Stanford University, and the developments he has made in the field of  Human Language Technologies in subjects such as chart parsing and functional unification grammar have to be mentioned, as well as the fact that he has been regarded as a leading authority on machine translation.

Another important researcher is Yorick Wilks, a British Computer Scientist who is a Professor of Computer Science at the University of Sheffield. There he directs the Institute for Language, Speech and Hearing. He wrote an algorithmic method “for assigning the “most coherent” interpretation to a sentence in terms of having the maximum number of internal preferences of its parts (normally verbs or adjectives) satisfied”. In the 1090s he got interested in modeling human-computer dialogue, and in this time he is the Director of the EU funded Companions Project on creating long-term computer companions for people.

Hans Uskoreit is also a researcher that has to be mentioned. He is Professor of Computational Linguistics at Saarland University and head of the DFKI Language Technology Lab, as he serves as Scientific Director at that German Research Center for Artificial Intelligence. During his career he has affiliated with several centers and he is member of lots of associations, as The European Academy of  Science or the International Committee of Computational Linguistics.

Read Full Post »

Older Posts »