16 June 2009 by María Losada
First of all we have to know that Machine Learning is part of Artificial Intelligence; as Tom Mitchel defined in his “Machine Learning” book “Machine Learning is the study of algorithms that allow computer programs to automatically improve through experience“. Machine Learning focuses most of the times on the study of Computational Complexity of the problems.
Machine Learning is applied in several areas, such as machine translation, automatic summarization or question-answering systems, and it is a good alternative to the manually built resources, since it can be improved at a lower cost and the guarantees are better. But linguistics may be in danger, for at this time more and more subtle specialist-reserved mathematical device are used.
In data analysis there are some systems that don’t need human intuition, but other systems are conceived so that the machine interacts with the expert. Nevertheless, human intuition is something that will always be needed, for the designer of the system is the one who decides and specifies the way information is represented and manipulated.
Artificial Intelligence has been created as the reflection of Natural Intelligence; intelligent behavior means that not always the reaction to a situation will be the same, what’s more, one of the qualities of intelligence is that behavior has not been programmed, but a computer only carries out something that has previously been programmed.
The algorithms that allow computers learn are classified based on the desired outcome of each algorithm, and Computational Learning Theory (a branch of Theoretical computer science) is responsible of its analysis.
The aim with Machine Learning is to make our life easier by doing programs that learn by themselves while they get experienced with the human, and are able to do common activities in a fast and effective way.
Posted in HLT, Littera | Leave a Comment »
23 May 2009 by María Losada
On the third questionnaire, one of the subjects we have seen is machine translation. We have been asked to write a short Curriculum Vitae in Spanish and then translate it with three different online-translators (Google Translator, Lucy Translator and Reverso Translator). The results were more or less satisfying, but there were some big mistakes on the Spanish-English translation. Here is an example:
Entre mis aficiones, además de los idiomas, se encuentra la música. Estudié solfeo durante 8 años en el Conservatorio Municipal de Música Bartolomé Ercilla de Durango [...]
Among my interests, besides the languages, he|she finds the music. I studied sol-fa|solfeggio for 8 years in the Municipal Conservatoire|Conservatory of Music Bartolomé Ercilla de Durango [...]
What’s more, in one online translator we are warned and told that an automatic translation will never have the same quality of a translation done by a person (and the translation will be worse if the language is colloquial). Nevertheless, it is useful and you don’t spend so much time.
Machine Translation is a sub-field of computational linguistics and it is the application of computers to translate a text from one natural language to another. What basic MT does is to substitute words from one natural language to another, but more complex translations use corpus techniques and pay attention to the linguistic typology and translate idioms, among other things.
Users can interact with some translators and make the translations less ambiguous, for some of those systems give the user the opportunity to say which words are names. What others translators offer is a list of suggestions, the user chooses the one which best fits with what he was searching for and if none of the possibilities is what he looks for, he does some changes until he gets what he wants. After the TransType project, the results showed that with this way of translating users didn’t spend so much time an effort.
To sum up, we should add something that Ana Fernández Guerra and Francisco Fernández wrote in the book “Machine Translation, Capabilities and Limitations“. We could make some statements in the activity of translating:
- The possibility of translation: we are supposed to reproduce with total exactness every single piece of text or linguistic structure in other language we would find it difficult.
- Realize that we don’t translate from one language as a system to another language as a system, but from one text into another text.
- We should be cautious about some dogmatic statements.
- In the content (or message) of the text we must consider: meaning, designation and sense.
- Machine translation. (2009, May 18). In Wikipedia, The Free Encyclopedia. Retrieved 18:54, May 22, 2009, from http://en.wikipedia.org/w/index.php?title=Machine_translation&oldid=290714195
- Machine Translation. (2009). In foreignworld.com. Retrieved 19:23, May 22, 2009, from http://www.foreignword.com/Tools/transnow.htm
- Machine Translation. (2008, December 18). In AIT topics. Retrieved 19:36, May 22, from http://www.aaai.org/AITopics/pmwiki/pmwiki.php/AITopics/MachineTranslation
- Machine Translation, Capabilities and Limitations. (Ana Fernández Guerra, Francisco Fernández). (2000). In Universitat de València. Retrieved 11:46, May 23, 2009, from http://books.google.es/books?id=7TE3avRZiSoC&printsec=frontcover&hl=en
Posted in HLT, Littera | Leave a Comment »
8 May 2009 by María Losada
As Jim Cowie and Yorick Wilks said in one article, “Information Extraction (IE) is the name given to any process which selectively structures and combines data which is found, explicitly stated or implied, in one or more texts”. We have to add that Information Extraction is a technology based on analyzing Natural Language, and when the fact about a topic is taken from a document, it is automatically entered into a datasabe. Computational Linguistic techniques play an important role on IE, because IE, in a way, is interested in the structure of the text, unlike IR, which understands texts as “bags of words”.
When the user enters a word or sentence, he only gets the specific information he is interested in (after a process of text analysis). So, instead of documents, which is what Information retrieval offers, we get just the information we need. That information has been probably taken from a collection of documents, but it has been summarized.
IE is getting more and more important, for the amount information available on the internet grows everyday. People can get to that information more easily thanks to marking-up the data with XML tags, among other things. And not only “people” turns to IE, but also groups use it to summarize medical documents or build medical and biomedical ontologies.
These are the most common subtasks on IE:
- Named Entity Recognition: recognition of entity names (for people and organizations), place names, temporal expressions, and certain types of numerical expressions.
- Coreference: identification chains of noun phrases that refer to the same object. For example, anaphora is a type of coreference.
- Terminology extraction: finding the relevant terms for a given corpus
- Relationship Extraction: identification of relations between entities, such as:
It hasn’t reached the market yet, but it could become a great helper to industries of all kinds (this is an example from Yorick Wilks and Jim Cowie “finance companies want to know facts of the following sort and on a large scale: what company take-overs happened in a given time span; they want widely scattered text information reduced to a simple data base”).
- Information Extraction. (Jim Cowie and Yorick Wilks). (2009). In The University of Sheffield, Department of Computer Science. Retrieved 17:21, May 7, 2009, from http://www.dcs.shef.ac.uk/~yorick/papers/infoext.pdf
- Information extraction. (2009). In GATE, General Architecture for Text Engineering (Natural Language Processing Group, The University of Sheffield). Retrieved 20:56, May 7, 2009, from http://gate.ac.uk/ie/
- Information extraction. (2009, April 29). In Wikipedia, The Free Encyclopedia. Retrieved 16:49, May 8, 2009, from http://en.wikipedia.org/w/index.php?title=Information_extraction&oldid=286789152
- Information Extraction. (2007, February 6). In Open Clinical, knowledge management for medical care. Retrieved 18:45, May 8, from http://www.openclinical.org/informationextraction.html
Posted in HLT, Littera | Leave a Comment »
25 April 2009 by María Losada
Speech Recognition is a branch of Artificial Intelligence that enables spoken communication between human and computer, but there are some difficulties in the attempt of getting a more or less acceptable interpretation of the message, because the coopertation between information from different sources (such as the acoustic, phonetic, semantic or pragmatic) is ambiguous and some mistakes are unavoidable in the process.
Nearly all the Speech synthesizers use libraries of speech sound. The creation of this dicctionarie is important, because it is important to recognize the word user uses. To make the recognition easier, here is a recognition of vowels and recognition of consonants, and also a noise masking (some movile phones, for example, can work when we “talk” to them, and if we are on the street, there must be something that makes the sound clear). But even if the system has these advantages, mistakes may not be avoided. Most speech recognition algorithms rely only on the sound of the individual words, and not on their context, so they don’t understand speech, but recognize words. Here is an example of what could happen:
The child wore a spider ring on Halloween.
He was an American spy during the war.
The sound of “spider ring” and “spy during” is exactly the same. We hear the correct words depending on the context, and is something that we do unconsciously.
There are many ways of application of this system, but I think that the fact people with disabilities benefit from it is the most interesting. Some of them are unable to use their hands, others are deaf and use deaf telephony (voicemail to text, realy services or captioned telephone), and others have learning disabilities. There’s no doubt that our life will be easier in some years’ time when this systems get better.
- Sistemas de reconocimiento y síntesis de voz. (1999). In Diccionario español/inglés para el aprendizaje de vocabulario utilizando una interfaz de voz. Retrieved 19:56 April 24, 2009, from http://catarina.udlap.mx/u_dl_a/tales/documentos/lis/ahuactzin_l_a/capitulo1.pdf
- Speech Recognition and Synthesis. (1999). In Stanford University. Retrieved 20:48, April 24, 2009, from http://ccrma.stanford.edu/CCRMA/Courses/152/speech_recognition.html
- Speech recognition. (2009, April 20). In Wikipedia, The Free Encyclopedia. Retrieved 09:34, April 25, 2009, from http://en.wikipedia.org/w/index.php?title=Speech_recognition&oldid=285079884
- Speech Synthesis and Recognition. (1997-2007). In The Scientist and Engineer’s Guide to Digital Signal Processing. Retrieved 10:14 April 25, 2009, from http://www.dspguide.com/ch22/6.htm
Posted in HLT, Littera | Leave a Comment »
4 April 2009 by María Losada
When we search for information on the net we can obtain it from different places and in different ways, for there is loads of available data on the internet.
Question answering, also known as QA, is a way of getting that information; this system should be able to answer our questions (done in natural language), searching in pre-structured database or documents written in natural language.
As Dell Zhang and Wee Sun Lee wrote in one article “it is important for an online question answering system to be practical, because it is time-consuming to download and analyze the original web documents”. A question answering system is another information retrieval system, but what QA systems do is supply just the information we need, not a list of possibilities as searching engines usually do. To obtain the answers, the QA systems combine some NLP techniques, because the answer depends on the type of question.
And as I have told, depending on the question, the methods used to find the answers are different. There are two methods: shallow and deep. The first one finds fragments of documents, filters the information based on the presence of the answer required, and then the answers are ordered based on different criteria, such as word order. If the way the question is formulated is not enough (or, for example, some of the questions based are classified with an incorrect type), the second method is used. “More sophisticated syntactic, semantic and contextual processing must be performed to extract or construct the answer”.
So, there have been many advances on this kind of information retrieval systems, but dealing with Natural Language with computers is quite difficult, and it can be hard to get the data we are looking for using that kind of language with systems that have to improve a lot.
- The problems in a Question Answering System in the academic domain. (2007). In RUA Repositorio Institucional de la Universidad de Alicante. Retrieved 20:08, April 1, 2009, from http://rua.ua.es/dspace/bitstream/10045/4297/1/ranlp07.pdf
- Question Answering. (2009, March 20). In Wikipedia, The Free Encyclopedia. Retrieved 20:24, April 01, 2009, from http://en.wikipedia.org/wiki/Question_answering
- Question Answering system. (2009). In START Natural Language Question Answering System. Retrieved 17:53, April 3, 2009, from http://start.csail.mit.edu/
- A Web-based Question Answering System (Dell Zhang and Wee Sun Lee). (2003, January). In Dspace. Retrieved 18:36, April 3, 2009, from http://dspace.mit.edu/bitstream/handle/1721.1/3693/CS029.pdf?sequence=2
Posted in HLT, Littera | Leave a Comment »
1 April 2009 by María Losada
Here is the list of 10 research topics in major sites on Human Language Technologies I have chosen:
- Machine Translation
- Question answering systems
- Machine Learning in NLP
- Development of linguistic resources and tools
- Reconocimiento y síntesis de voz (Speech Recognition and Synthesis)
- Intelligent systems for natural language interaction
- Information retrieval, question answering, and information extraction
- Monolingual and multilingual text generation
- Lexical semantics and word sense disambiguation
- Human factors in MT and user interfaces
I’ll write one article for each topic that I have put in bold.
Topics taken from:
- The 25th edition of the Annual Conference of the Spanish Society for Natural Language Processing (SEPLN). (2009). In SEPLN2009. Retrieved 17:53, April 1, 2009 from http://ixa2.si.ehu.es/sepln2009/
- XXIV edición del Congreso Anual de la Sociedad Española para el Procesamiento del Lenguaje Natural 2008 (SEPLN’ 2008). (2008). In XXIV Congreso SEPLN ‘08. Retrieved 17:58, April 1, 2009, from http://basesdatos.uc3m.es/sepln2008/web/
- NAACL HLT 2007 Human Language Technology Conference. (2007). In NAACL-HLT 2007. Retrieved 18:07, April 1, 2009, from http://www.cs.rochester.edu/meetings/hlt-naacl07/call_papers.shtml
- 13th Annual Conference of the European Association for Machine Translation. (2009). In EAMT 2009. Retrieved 18:14, April 1, 2009, from http://www.talp.cat/eamt09/index.php/call-for-papers
- EACL 2009 12th Conference of the European Chapter of the Association for Computational Linguistics. (2009). In EACL 2009. Retrieved 18:36, April 1, 2009, from http://www.eacl2009.gr/conference/topics
Posted in HLT, Littera | Leave a Comment »
28 March 2009 by María Losada
There are some important researchers on the field of Human Language Technologies (HLT). One of those researchers is Martin Kay. As he says, his main interests are translation (by people and machines), and computational linguistic algorithms, specially in the fields of morphology and syntax. He is
well known for his work in computational linguistics; what’s more, he started to work in one of the earliest centres of Computational Linguistics research: the Cambridge Language Research Unit. He is nowadays Professor of Linguistics at Stanford University, and the developments he has made in the field of Human Language Technologies in subjects such as chart parsing and functional unification grammar have to be mentioned, as well as the fact that he has been regarded as a leading authority on machine translation.
Another important researcher is Yorick Wilks, a British Computer Scientist who is a Professor of Computer Science at the University of Sheffield. There he directs the Institute for Language, Speech and Hearing. He wrote an algorithmic method “for assigning the “most coherent” interpretation to a sentence in terms of having the maximum number of internal preferences of its parts (normally verbs or adjectives) satisfied”. In the 1090s he got interested in modeling human-computer dialogue, and in this time he is the Director of the EU funded Companions Project on creating long-term computer companions for people.
Hans
Uskoreit is also a researcher that has to be mentioned. He is Professor of Computational Linguistics at Saarland University and head of the DFKI Language Technology Lab, as he serves as Scientific Director at that German Research Center for Artificial Intelligence. During his career he has affiliated with several centers and he is member of lots of associations, as The European Academy of Science or the International Committee of Computational Linguistics.
- Martin Kay. (2008, June 7). In Wikipedia, The Free Encyclopedia. Retrieved 12:53, March 28, 2009, from http://en.wikipedia.org/w/index.php?title=Martin_Kay&oldid=217746063
- Martin Kay. (2009). In Standfor.edu. Retrieved 13:29, March 28, 2009, from http://www.stanford.edu/~mjkay/
- Yorick Wilks. (2009). In Oxford Internet Institute, University of Oxford. Retrieved 13:47, March 28, 2009, from http://people.oii.ox.ac.uk/yorick/about/
- Yorick Wilks. (2009, March 18). In Wikipedia, The Free Encyclopedia. Retrieved 14:43, March 28, 2009, from http://en.wikipedia.org/w/index.php?title=Yorick_Wilks&oldid=278171099
- Hans Uszkoreit. (2009). In University of Saarland. Retrieved 15:36, March 28, 2009, from http://www.coli.uni-saarland.de/~hansu/bio.html
- Hans Uszkoreit. (2009). In VIDEOLECTURES.net. Retrieved 21:48, March 28, 2009, from http://videolectures.net/hans_uszkoreit/
Posted in HLT, Littera | Leave a Comment »
21 March 2009 by María Losada
The Human Language Technologies (HLT), also known as Language Technologies or Natural Language Processing (NLP), are closely connected to computer science and linguistics. HLT enables people to interact with machines with more ease. We find an example of how HLT can help people: “This can benefit a wide range of people – from illiterate farmers in remote villages who want to obtain relevant medical information over a cellphone, to scientists in state-of-the-art laboratories who want to focus on problem-solving with computers.”
As Hans Uszkoreit wrote in one of his publications, there is a problem in the interaction between human and machine, for there is a communication problem. Machines’ language and human language is not the same since machine’s domain of language is very restricted. But with NLP, the data used by computers becomes readable for human; it designs mechanisms of communication which work with programs that simulate the communication.
But, although there have been many advantages in this field, we still can find some difficulties when we communicate with a computer. When we enter a sentence, it is likely that some words have more than one meaning; and if we don’t pay attention to the structure of the sentence, it can become ambiguous for the computer and it may not understand what we intended to say. But, as the researcher previously mentioned said, “the whole world of multimedia information can only be structured, indexed and navigated through language”, so it is just a question of years and development that HLT works without any problem.
Posted in HLT, Littera | Tagged HLT, Human Language Technologies, Natural Language Processing, NLP | Leave a Comment »
8 February 2009 by María Losada
Rich Site Summary or Really Simple Syndication is what RSS means. This is a “format for syndicating news and the content of news”. If you look for some information with the RSS format, it is likely that the information you get is more or less what you wanted, and you get it quickly and updated as well.
Its structure is made up of items, and each item has a title, a summary of a text and a link to the original source in the web where the whole text is located. The RSS files have a summary of what has been published in the original website, but there are not only news, but also changes on a website can be shown, or “the revision history of a book”.
You can obtain and offer information with the RSS, since those files contain meta data about the information sources; but to share information, some software and an aggregator are needed. The programs that can read the RSS sources are the feeds, and the aggregator can be installed in the user’s computer, although some searchers have it included in their programs, and another way is to register in the web site of the aggregator.
So if you like a site and you know that you’ll visit it quite often, you should register to a feed, for you are informed when the site is brought up to date, when new information has been included etc. And if you are the one who wants to offer information, you have to create your own feed and update it quite often to make it interesting for the rest of the users.
References:
- Nacho(editor). ¿Qué es RSS-y XML, RDF, Atom,…?. (2004, May 18). In Microsiervos. Retrieved 11:21, February 8, 2009, from http://www.microsiervos.com/archivo/internet/que-es-rss-y-xml-rdf-atom.html
- Mark Pilgrim(editor). What is RSS?. (2002, December 18). In O’ Reilly XML.com. Retrieved 11:37, February 8, 2009, from http://www.xml.com/pub/a/2002/12/18/dive-into-xml.html?page=1
- RSS ¿qué es?.(2000). In Euroresidentes. Retrieved 11: 48, February 8, 2009, from http://www.euroresidentes.com/Diversion/Internet/rss.htm
- Mónica Pérez Esteban(editor). ¿Qué es el RSS?.(2006, March 26). In Guía fácil del RSS. Retrieved 12:03, February 8, 2009, from http://es.geocities.com/rss_guia_facil/que_es_rss.html
Posted in Littera, ist | Leave a Comment »
7 February 2009 by María Losada
When we talk about Hypertext, we are referring to a text that leads you to another text usually through hyperlinks. This term was created by Ted Nelson, and he defined it as “a body of written or pictorial material interconnected in such a complex way that it could not conveniently be presented or represented on paper”.
In the days when this term was innovative, it referred to a form of electronic text; it was a new way of getting the information and publishing it. The texts consist of blocks, they are divided in blocks and these blocks are joined with links. Hypertext is like a puzzle: it has pieces, and if they are all joined, you get the whole piece.
One of the main objectives of the hypertext is to organize big amounts of information. If a hypertext if good, you might find several links on the text; if it has less than three links, we could consider it simply sequential text. Another important thing, although it might sound obvious, is to put the links correctly and make sure that the links go to documents that you have under your control, since you may not find a document if you have put a link to another person’s text and he or she has deleted it.
References:
- Noah Wardrip-Fruin(author). What Hypertext is. (2004). In acm Portal, the guide to computing literature. Retrieved 21:57, February 7, 2009, from http://portal.acm.org/citation.cfm?id=1012844
- Hypertext. (2009, February 4). In Wikipedia, The Free Encyclopedia. Retrieved 22:06, February 7, 2009, from http://en.wikipedia.org/w/index.php?title=Hypertext&oldid=268470536
- Hypertext. In Connected: an Internet Encyclopedia. Retrieved 22:13, February 7, 2009, from http://www.lincoln.edu/math/rmyrick/ComputerNetworks/InetReference/12.htm
- George P. Landow(author).The definition of Hypertext and Its History as a Concept(1992). In Hypertext & Hypermedia. Retrieved 22:24, February 7, 2009, from http://www.cyberartsweb.org/cpace/ht/jhup/history.html#1
Posted in Littera, ist | Leave a Comment »