Life Analytics Blog
Practical Applications of Data Mining, Text Mining and Information ExtractionThemos Kalafatishttp://www.blogger.com/profile/14323291739097798038noreply@blogger.comBlogger73125
Updated: 20 hours 8 min ago
Text Analytics for Telecommunications - Part 1
As discussed in the previous post, performing Text Analytics for a language for which no tools exist is not an easy task. The Case Study which i will present in the 9th European Text Analytics Summit is about analyzing and understanding thousands of Non-English FaceBook posts and Tweets for Telco Brands and their Topics, leading to what is known as Competitive Intelligence.
The Telcos used for the Case Study are Telenor, MT:S and VIP Mobile which are located in Serbia. The analysis aims to identify the perception of Customers for each of the three Companies mentioned and understand the Positive and Negative elements of each Telco as this is captured from the Voice of the Customers - Subscribers.
By analyzing several thousands of Tweets and FaceBook posts and comments we can have a first glimpse of Competitive Intelligence. For example when we wish to identify which words frequently occur with mentions about postpaid packages this is what we find :
Red boxes show Telco Brands - notice "mts" and "mtsa" which point to the same Telco, namely mt:s. Blue boxes indicate similar words that should be merged. From a first look at the results above we see that :
a) mt:s is found more frequently when users mention PostPaid packages.
b) Telenor and VIP Mobile are not found as frequently as MT:S in PostPaid package conversations.
c) We see several problems from insufficient pre-processing : Kredit and Kredita (=credit) should merge into one word, the same applies for telefona - telefon, internet - interneta and mts - mtsa.
Notice that we can perform the same High-level analysis for several Telco Topics such as Network, Billing, Customer Care, Promotions, Questions of subscribers and so on. The next task is to identify the reason(s) why MT:S was found to have more mentions about PostPaid packages. Note that at this point we do not know why this is so : It could be the fact that MT:S prices of prepaid packages are high, very cheap or something else is happening that needs to be identified.
The Serbian Language poses extra work because it is a highly inflected language : Even the ending of Brand names change according to the usage. Consider the following :
U mts-u (at mts)
Sa mts-om (With mts)
Bez mts-a (Without mts)
It is evident that a highly inflected language explodes our feature space and for this reason R can come to the rescue with some success. We can use R for changing several synonyms to one word, removing (Serbian) stop words, removing URLs and performing several other pre-processing steps that are necessary prior to an extensive analysis. More on the next post.
The Telcos used for the Case Study are Telenor, MT:S and VIP Mobile which are located in Serbia. The analysis aims to identify the perception of Customers for each of the three Companies mentioned and understand the Positive and Negative elements of each Telco as this is captured from the Voice of the Customers - Subscribers.
By analyzing several thousands of Tweets and FaceBook posts and comments we can have a first glimpse of Competitive Intelligence. For example when we wish to identify which words frequently occur with mentions about postpaid packages this is what we find :
Red boxes show Telco Brands - notice "mts" and "mtsa" which point to the same Telco, namely mt:s. Blue boxes indicate similar words that should be merged. From a first look at the results above we see that :
a) mt:s is found more frequently when users mention PostPaid packages.
b) Telenor and VIP Mobile are not found as frequently as MT:S in PostPaid package conversations.
c) We see several problems from insufficient pre-processing : Kredit and Kredita (=credit) should merge into one word, the same applies for telefona - telefon, internet - interneta and mts - mtsa.
Notice that we can perform the same High-level analysis for several Telco Topics such as Network, Billing, Customer Care, Promotions, Questions of subscribers and so on. The next task is to identify the reason(s) why MT:S was found to have more mentions about PostPaid packages. Note that at this point we do not know why this is so : It could be the fact that MT:S prices of prepaid packages are high, very cheap or something else is happening that needs to be identified.
The Serbian Language poses extra work because it is a highly inflected language : Even the ending of Brand names change according to the usage. Consider the following :
U mts-u (at mts)
Sa mts-om (With mts)
Bez mts-a (Without mts)
It is evident that a highly inflected language explodes our feature space and for this reason R can come to the rescue with some success. We can use R for changing several synonyms to one word, removing (Serbian) stop words, removing URLs and performing several other pre-processing steps that are necessary prior to an extensive analysis. More on the next post.
Categories: Blogroll
Case Study : Competitive Intelligence for Telecommunications
Telcos are a good example of a fast moving business environment and a good candidate for using Competitive Intelligence analysis from Social Media sources. The Case Study involves three major Telcos located in an Eastern European Country and shows the results from the analysis of thousands of Tweets and FaceBook wall posts to understand the following :
- How subscribers perceive each Telco Brand?
- Which information do subscribers tend to Re-Tweet and "Like" on FaceBook Wall Posts?
- Which words and Topics are commonly found with Intense feelings / thoughts?
- Which topics are mostly discussed when subscribers compare two or more Telco operators?
- What do subscribers discuss about Network Quality and Speed, Billing, Promotions, Marketing Events, Customer Care, TV Commercials etc.
- How do they prioritize these topics and which of them are interesting and why?
- What do subscribers talk about in general (i.e without any Telco Brand being mentioned) regarding Internet speed, Charges and what would they expect to see more?
I will present the Case Study mentioned above in the forthcoming 9th Annual European Text Analytics Summit in April in London - UK. The Case Study is an example of application of Text Analytics to a language for which currently no tools exist and thus all difficulties and possible solutions will also be discussed. Examples will be also given on analyzing information to different conceptual levels and how this technique provides even more insights in consumer behavior.
The following tools were used for the analysis :
- GATE to annotate all Topics that occur within Telco conversations (such as "sms", "internet", "dropped call", "network","promotion") and for setting up Conceptual Levels.
- R for pre-processing Text and performing Text Classification, Topic Detection and Cluster Analysis.
- WEKA for Feature Selection and Text Classification.
- Finally, Java is used to manage the information that is generated from GATE such as understanding how subscribers prioritize various Telco Concepts and Topics and also identify important phrases and/or words that frequently occur when these Topics are being discussed.
- How subscribers perceive each Telco Brand?
- Which information do subscribers tend to Re-Tweet and "Like" on FaceBook Wall Posts?
- Which words and Topics are commonly found with Intense feelings / thoughts?
- Which topics are mostly discussed when subscribers compare two or more Telco operators?
- What do subscribers discuss about Network Quality and Speed, Billing, Promotions, Marketing Events, Customer Care, TV Commercials etc.
- How do they prioritize these topics and which of them are interesting and why?
- What do subscribers talk about in general (i.e without any Telco Brand being mentioned) regarding Internet speed, Charges and what would they expect to see more?
I will present the Case Study mentioned above in the forthcoming 9th Annual European Text Analytics Summit in April in London - UK. The Case Study is an example of application of Text Analytics to a language for which currently no tools exist and thus all difficulties and possible solutions will also be discussed. Examples will be also given on analyzing information to different conceptual levels and how this technique provides even more insights in consumer behavior.
The following tools were used for the analysis :
- GATE to annotate all Topics that occur within Telco conversations (such as "sms", "internet", "dropped call", "network","promotion") and for setting up Conceptual Levels.
- R for pre-processing Text and performing Text Classification, Topic Detection and Cluster Analysis.
- WEKA for Feature Selection and Text Classification.
- Finally, Java is used to manage the information that is generated from GATE such as understanding how subscribers prioritize various Telco Concepts and Topics and also identify important phrases and/or words that frequently occur when these Topics are being discussed.
Categories: Blogroll
New Insights from Text Analytics
Text Analytics has gained the attention it deserves in the past few years. Sentiment Analysis is perhaps the most frequently discussed type of analysis but there will be always new ways to analyze and gain insights from text data.
Examples of new types of analysis -and they have a vast potential- are in my opinion two : Sequence Detection and Concept Mining. I am not aware whether these types of analysis are currently being implemented by any Text Mining practitioner at the moment and if there is one, feel free to add your comments below.
So what is Sequence Detection and Concept Mining ? Some examples :
Suppose that you receive several similar e-mails sent from customers as the one seen below :
"I have been trying repeatedly to solve my billing problem through customer care. I first talked with someone called Mrs Jane Doe. She said she should transfer my call to another representative from the sales department. Yet another rep from the sales department informed me that i should be talking with the Billing department instead. Unfortunately my bad experience of being transferred through various representatives was not over because the Billing department informed me that i should speak to the......"
Currently Text Analytics software will identify key elements of the above text but a very important piece of information goes unnoticed. It is the sequence of events which takes place :
(Jane Doe => Sales Dept =>Billing Dept =>...)
Being able to detect the sequence of events is an important element in understanding customer interaction. In our example above, imagine the possibility of detecting similar sequences through thousands of e-mails or call center transcripts and running a sentiment analysis, a process which then could correlate sentiment with specific event sequences.
Next, is the usage of Concept Mining (this is just a phrase i coined for this post) : Being able to analyze information to different conceptual levels. A very powerful technique indeed and let's see why this is so.
People that have attended the 7th annual Text Analytics Summit in Boston had the opportunity to listen to several presentations regarding Semantics. The discussions between experts from the Semantics Panel and the attendees revealed that people could not find Semantics practical for several reasons. Yet, in Semantics lies the power of being able to find patterns on different conceptual levels.
As a -very basic- example, if we use Information Extraction to annotate -say- the Tweets containing mentions of American Telcos we can tag each one as a more general category called TELCOS. We can also tag individual prepaid packages as a more general category called PREPAID_PACKAGES. By doing that we can then search for patterns in a more general conceptual level than searching for patterns only at a Telco Brand level or a specific Telco's prepaid package. As an example we can run a sentiment analysis on all prepaid packages mentions, identify patterns of negative or positive sentiment and see which Telco is the winner of positive sentiment at a conceptual level.
The possibilities are endless.
Examples of new types of analysis -and they have a vast potential- are in my opinion two : Sequence Detection and Concept Mining. I am not aware whether these types of analysis are currently being implemented by any Text Mining practitioner at the moment and if there is one, feel free to add your comments below.
So what is Sequence Detection and Concept Mining ? Some examples :
Suppose that you receive several similar e-mails sent from customers as the one seen below :
"I have been trying repeatedly to solve my billing problem through customer care. I first talked with someone called Mrs Jane Doe. She said she should transfer my call to another representative from the sales department. Yet another rep from the sales department informed me that i should be talking with the Billing department instead. Unfortunately my bad experience of being transferred through various representatives was not over because the Billing department informed me that i should speak to the......"
Currently Text Analytics software will identify key elements of the above text but a very important piece of information goes unnoticed. It is the sequence of events which takes place :
(Jane Doe => Sales Dept =>Billing Dept =>...)
Being able to detect the sequence of events is an important element in understanding customer interaction. In our example above, imagine the possibility of detecting similar sequences through thousands of e-mails or call center transcripts and running a sentiment analysis, a process which then could correlate sentiment with specific event sequences.
Next, is the usage of Concept Mining (this is just a phrase i coined for this post) : Being able to analyze information to different conceptual levels. A very powerful technique indeed and let's see why this is so.
People that have attended the 7th annual Text Analytics Summit in Boston had the opportunity to listen to several presentations regarding Semantics. The discussions between experts from the Semantics Panel and the attendees revealed that people could not find Semantics practical for several reasons. Yet, in Semantics lies the power of being able to find patterns on different conceptual levels.
As a -very basic- example, if we use Information Extraction to annotate -say- the Tweets containing mentions of American Telcos we can tag each one as a more general category called TELCOS. We can also tag individual prepaid packages as a more general category called PREPAID_PACKAGES. By doing that we can then search for patterns in a more general conceptual level than searching for patterns only at a Telco Brand level or a specific Telco's prepaid package. As an example we can run a sentiment analysis on all prepaid packages mentions, identify patterns of negative or positive sentiment and see which Telco is the winner of positive sentiment at a conceptual level.
The possibilities are endless.
Categories: Blogroll


