Data Mining

  • Most Topular Stories

  • Prepare for the Unexpected

    Data mining News
    21 Oct 2014 | 6:37 am
    … cumulative sum charts (CUSUM)) or data mining approaches (such as proportional reporting …
  • The Longform Manifesto

    Data Mining: Text Mining, Visualization and Social Media
    Matthew Hurst
    25 Sep 2014 | 10:37 pm
    Sometimes a title for a blog posts suggests itself to me which seems so self contained that it takes real effort to actual write the post ('Machine Intelligence, not Machine Learning is the Next Big Thing' is another in this line). The idea behind the (or a) Longform Manifesto is as follows. I have become aware of late of the sense of deterioration that is associated with the mobile 'revolution' and the info snacking, casual gaming and interupt driven lifestyle that it has entailed. The behaviours are perfectly illustrated in this scene from Portlandia:   With a daughter…
  • Using Air Traffic Data to Predict Ebola’s Spread

    The Numbers
    Jo Craven McGinty
    17 Oct 2014 | 10:32 am
    While a number of researchers are modeling the spread of Ebola in West African countries, a group at Boston’s Northeastern University has used air traffic connections to explore how the disease might spread to the rest of the world.
  • Hyperparameter search, Bayesian optimization and related topics

    natural language processing blog
    10 Oct 2014 | 10:55 am
    In terms of (importance divided-by glamour), hyperparameter (HP) search is probably pretty close to the top. We all hate finding hyperparameters. Default settings are usually good, but you're always left wondering: could I have done better? I like averaged perceptron for this reason (I believe Yoav Goldberg has also expressed this sentiment): no pesky hyperparameters.But I want to take a much broader perspective on hyperparameters. We typically think of HPs as { regularization constant, learning rate, architecture } (where "architecture" can mean something like neural network structure,…
  • Why Can't I Reactivate Or Acquire Customers Anymore?

    Kevin Hillstrom: MineThatData
    Kevin Hillstrom
    20 Oct 2014 | 8:15 pm
    The theme of the fall is this:It has become really difficult to acquire new customers.It has been difficult to reactivate lapsed buyers for a couple of years now.Consequently, the customer file is being starved.If the customer file is being starved, it is going to be really hard to grow in the future.In catalog marketing, it is now clear why it has become so hard to acquire new customers.The cataloger focused on a 50 - 75 year old customer ... and has for the past decade.The co-ops spun 50 - 75 year old customers to catalogers "at scale", creating an unprecedented level of laziness and…
 
  • add this feed to my.Alltop

    Data Mining: Text Mining, Visualization and Social Media

  • The Longform Manifesto

    Matthew Hurst
    25 Sep 2014 | 10:37 pm
    Sometimes a title for a blog posts suggests itself to me which seems so self contained that it takes real effort to actual write the post ('Machine Intelligence, not Machine Learning is the Next Big Thing' is another in this line). The idea behind the (or a) Longform Manifesto is as follows. I have become aware of late of the sense of deterioration that is associated with the mobile 'revolution' and the info snacking, casual gaming and interupt driven lifestyle that it has entailed. The behaviours are perfectly illustrated in this scene from Portlandia:   With a daughter…
  • Scottish Independence : Bing Predicts 'No'

    Matthew Hurst
    18 Sep 2014 | 9:58 am
    Bing's prediction team has a feature live on the site right now that predicts Scotland will not become an independant nation as a result of today's referendum.
  • Bing hearts World Cup 2014, Google - not so much

    Matthew Hurst
    12 Jul 2014 | 12:19 pm
    While Google has been doing a great job of their front page animations (today's is very nice, illustrating how Brazil and The Netherlands are on their way to Russia for 2018), Bing appears to be far more attentive to actually answering questions about the competition. For example: Compared to Bing's Google's answer brings up some interesting news articles, but Bing brings up stats on the teams and even a prediction of who will win (Cortana - which is driving these predictions - has been doing a perfect job of predicting game outcomes).
  • GrubHub's Phasmid Websites

    Matthew Hurst
    3 May 2014 | 9:49 pm
    The rationale behind mining business data directly from the business's own website is that the business has a clear economic motivation to ensure that the data is up to date. If you own a restaurant that changes location, and your website still publishes the former address, those potential customers who visit your site will not be enjoying your delicious offerings. For the web mining proposition to work, it is important to firstly know that you have in your hand a genuine business website and secondly, to have excellent extraction and inference technology to pull the required…
  • Hopper - new in the travel space

    Matthew Hurst
    19 Jan 2014 | 11:24 am
    Briefly - Hopper is something new in the travel  / local space. In their own words: What if you could plan an amazing trip based on a vague idea — like “spring surfing in California” or “Mediterranean cruise”? What if logistical information popped up right when you needed it, so you wouldn't have to spend hours on research? This is our vision: to make planning a trip an effortless extension of discovering and exploring new places. We spent several years experimenting with different tools, technology and algorithms to collect, organize and manage massive amounts of…
  • add this feed to my.Alltop

    The Numbers

  • Using Air Traffic Data to Predict Ebola’s Spread

    Jo Craven McGinty
    17 Oct 2014 | 10:32 am
    While a number of researchers are modeling the spread of Ebola in West African countries, a group at Boston’s Northeastern University has used air traffic connections to explore how the disease might spread to the rest of the world.
  • Leaving Puerto Rico, Counting Calories and a New No. 1 (Statshot)

    David Goldenberg
    17 Oct 2014 | 9:46 am
    Far more Puerto Ricans now live off the island than on it, many fast food chains have started serving slightly lighter fare, and Mississippi State took over first place in the AP football poll this week for the first time in its history.
  • Quiz: How Do Politics Relate to Shopping Habits?

    Rani Molla
    17 Oct 2014 | 4:46 am
    People's political beliefs extend into a number of areas of their lives. According to data from a market research company, these belief systems also relate to how and what people buy.
  • Remember Pens and Pencils? They’re Doing Just Fine

    Rani Molla
    16 Oct 2014 | 8:51 am
    Remember pens and pencils? They're not only still around, but they're selling well. That has to do not just with surviving technology, but learning to work with it.
  • Americans Hate Congress, but Like Their Own Representatives

    Rani Molla
    15 Oct 2014 | 9:01 am
    For Americans, their own Congress member is the devil they know. Voters have more favorable views of their own Congress members than they do of Congress in general, according to a Gallup poll released today.
 
  • add this feed to my.Alltop

    natural language processing blog

  • Hyperparameter search, Bayesian optimization and related topics

    10 Oct 2014 | 10:55 am
    In terms of (importance divided-by glamour), hyperparameter (HP) search is probably pretty close to the top. We all hate finding hyperparameters. Default settings are usually good, but you're always left wondering: could I have done better? I like averaged perceptron for this reason (I believe Yoav Goldberg has also expressed this sentiment): no pesky hyperparameters.But I want to take a much broader perspective on hyperparameters. We typically think of HPs as { regularization constant, learning rate, architecture } (where "architecture" can mean something like neural network structure,…
  • Machine learning is the new algorithms

    3 Oct 2014 | 10:19 am
    When I was an undergrad, probably my favorite CS class I took was algorithms. I liked it (a) because my background was math so it was the closest match to what I knew and (b) because even though it was "theory," a lot of the stuff we learned was really relevant. Over time, it seemed like the area had distilled worthwhile algorithms from interesting-in-theory-but-you'll-never-actually use algorithms.In fact, I think this is a large part of why most undergraduate CS degrees today require a course in algorithms. You have these very nice, clearly defined statements, and very elegant solutions to…
  • AMR: Not semantics, but close (? maybe ???)

    27 Sep 2014 | 9:00 am
    Okay, necessary warning. I'm not a semanticist. I'm not even a linguist. Last time I took semantics was twelve years ago (sigh.)Like a lot of people, I've been excited about AMR (the "Abstract Meaning Representation") recently. It's hard not to get excited. Semantics is all the rage. And there are those crazy people out there who think you can cram meaning of a sentence into a !#$* vector [1], so the part of me that likes Language likes anything that has interesting structure and calls itself "Meaning." I effluviated about AMR in the context of the (awesome) SemEval panel.There is an LREC…
  • Reading group notes: point/counter-point on "predict models"

    31 Jul 2014 | 6:26 am
    In our local summer reading group, I led the discussion of two papers that appeared in Baltimore last month:Marco Baroni, Georgiana Dinu & German Kruszewski, Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. ACL 2014.Omer Levy & Yoav Goldberg., Linguistic Regularities in Sparse and Explicit Word Representations. CoNLL 2014 (best paper award recipient).I love handouts, so I made a handout for this one too. I paste below the handout. All good ideas are those of the respective authors; all errors and bad ideas are probably due to…
  • Hello, World!

    27 Jul 2014 | 7:18 am
    Okay, usually Hello World is the first program you learn to write in a new programming language. For fun, I've been collecting how to say hello world in different human languages, something remarkably difficult to search for (because of the overloading of the word "language"). I have 28. I'd like to make it to 280 :). If you have one (or more) to contribute, email me, post a comment, or tweet to me @haldaume3. And of course if you think any of these is wrong, please let me know that too. 1 bar Servus Woid! 2 ca Hola Món! 3 de Hallo Welt! 4 en Hello World! 5 eo Saluton, Mondo! 6 es ¡Hola…
  • add this feed to my.Alltop

    Kevin Hillstrom: MineThatData

  • Why Can't I Reactivate Or Acquire Customers Anymore?

    Kevin Hillstrom
    20 Oct 2014 | 8:15 pm
    The theme of the fall is this:It has become really difficult to acquire new customers.It has been difficult to reactivate lapsed buyers for a couple of years now.Consequently, the customer file is being starved.If the customer file is being starved, it is going to be really hard to grow in the future.In catalog marketing, it is now clear why it has become so hard to acquire new customers.The cataloger focused on a 50 - 75 year old customer ... and has for the past decade.The co-ops spun 50 - 75 year old customers to catalogers "at scale", creating an unprecedented level of laziness and…
  • The Biggest Story Of The Fall

    Kevin Hillstrom
    19 Oct 2014 | 8:15 pm
    The biggest story of the fall, to date, is the inability of so many e-commerce, retail, and catalog businesses to reactivate customers, or to acquire new customers.It is an epidemic, folks. You keep asking me if your situation is unique.Your situation, my friends, is not unique.Catalogers, known to grumble with the best of them, are rumbling these days about the "collapse of the co-ops". I hear the questions all the time ... "The co-op business model literally forced me to use them, and now, performance is awful and nobody will help me. What happened, and how I can fix the problem?" Hint…
  • Returns

    Kevin Hillstrom
    16 Oct 2014 | 8:15 pm
    Some of you sell widgets, and nobody ever returns widgets. That's a good thing. You picked the right business model.The rest of us deal with returns.On the surface, you want to do everything possible to prevent a return. You lose the sales associated with the return, and you are dinged five bucks or ten bucks for shipping and/or warehousing fees. Your CFO sure doesn't want you encouraging customers to return merchandise, now does she?When you do the math, however, you learn two very interesting things.Returns act as a "mini-order" if the customer exchanges the item for something else. In…
  • Diagnostics: What Do I Do About The Results?

    Kevin Hillstrom
    15 Oct 2014 | 8:15 pm
    It's easy to create a system for diagnosing business challenges.It's not easy to communicate the results in a digestible way, without offending people.Therefore, try to stick to the facts. It's perfectly reasonable to say that the business is not meeting expectations, and to show why. Try to stay away from opinions like these:"Your merchandising team really butchered the past three years, didn't they?""You need to shift to an omnichannel business model or you'll be out of business in a few years.""Your business model is so old-school that you need to digitize or die."Stick to the facts. Take…
  • Buying an iPhone 6 - Omnichannel!!

    Kevin Hillstrom
    14 Oct 2014 | 8:15 pm
    Your business is probably in lock-down ... your IT team won't let you do anything, in an effort to "protect the business" through Christmas.Other businesses are at the whim of technology. Imagine having to forecast the sale of tens of millions of iPhones? Good luck getting that right. No amount of genius can allow anybody to accurately forecast that kind of demand.So when I visited my local mobile phone provider on Monday, I, too, wanted to upgrade one of the phones in the household to an iPhone 6.Step 1: Upon entrance, I was greeted by a tablet-toting employee. I told the employee I wanted…
 
  • add this feed to my.Alltop

    TIBCO Spotfire's Trends and Outliers

  • Three Ways to Avoid a Big Data Bottleneck

    Spotfire Blogging Team
    21 Oct 2014 | 6:03 am
    As companies grapple with the tsunami of data coming from connected devices, mobile, and the Web, there is the potential for a big data bottleneck to block business innovation. That’s the assertion of Brian McCarthy, managing director of information and analytics strategy at Accenture Analytics, in a new Harvard Business Review blog post. He suggests that organizations take three steps to avoid the analysis paralysis that can result from embracing data-driven decision-making. First, despite the warp-speed that data may appear to be flowing through the corporate network, organizations should…
  • In Recognition of Excellence in Advanced Analytics

    Spotfire Blogging Team
    20 Oct 2014 | 5:55 am
    At TIBCO Spotfire, our mission is providing companies, non-profit organizations, government agencies, and other entities with the ability to capture the right information at the right time and act on it proactively to gain competitive advantage. Occasionally, the success that’s achieved by our clients is recognized by the industry. This past week, TIBCO Spotfire’s advanced analytics solution purpose-built for a client was honored with a 2014 Data Impact Award. The award ceremony, hosted by Cloudera, was held on October 15 in tandem with the Strata + Hadoop World Conference in New York.
  • Using Analytics to Find Profitable Customers

    Spotfire Blogging Team
    17 Oct 2014 | 6:40 am
    “It is no longer good enough to simply satisfy your customers or to have a product that works. No longer can you merely deliver a service within the timescale you have set. All these are important and we have to do them. But what will really make the difference is when the customer asks: when I went through that experience, did the provider really engage with me, did they understand my needs, did they think logically about what was best for me?” said Jo Causon, author of “Customer service: What should you measure to generate ROI?” Customer analytics can give you these types of…
  • Analytics to Drive the Next Best Action

    Spotfire Blog Editor
    16 Oct 2014 | 5:55 am
    Next best action marketing has gained a great deal of momentum in recent years. That’s because companies are increasingly focused on gathering, analyzing, and acting on insights for appropriate actions to take with individual customers. For instance, let’s say a bank customer applies for a new credit card. Bank officials can use transactional, lifecycle status, and other information about that customer along with analytics to help them determine that the next best action is to offer that customer a home equity loan at a lower rate than the credit card he was hoping to get. The use of…
  • Big Data Driving Big Results, Survey Finds

    Spotfire Blogging Team
    15 Oct 2014 | 6:24 am
    The vast majority of executives from companies leveraging big data to drive business are satisfied with the results, according to new research from Accenture. Accenture surveyed C-level executives and other senior technology leaders from 19 countries; 94 percent of executives who have tapped into big data said they are satisfied with the results, the study found. Moreover, 89 percent of those respondents rate big data as “very important” or “extremely important” to transforming operations into a digital business. Eighty-two percent of executives agreed big data provides a significant…
  • add this feed to my.Alltop

    PolicyMap

  • New Unbanked Data on PolicyMap!

    Kristin Crandall
    20 Oct 2014 | 2:13 pm
    Have you been to your local bank branch lately? Perhaps withdrawn money from your checking or savings account using an ATM? Many of us who have a relationship with a traditional financial institution may take it for granted, but a lot of people are without access to these institutions. Growing attention is being paid to households who are considered “unbanked,” meaning the household lacks any kind of deposit account at an insured depository institution, or “underbanked,” meaning the household has a checking and/or savings account but has also used alternative financial services (AFS)…
  • PolicyMap Wins Gold Stevie Award for Web Programming/Design

    Katie Nelson
    15 Oct 2014 | 2:30 pm
    Philadelphia, PA – 10/15/14 – PolicyMap was named the winner of a Gold Stevie® Award in the Best Web Software Programming/Design category in The 11th Annual International Business Awards today. More than 3,500 nominations from organizations of all sizes and in virtually every industry were submitted this year for consideration in a wide range of categories, including Company of the Year, Website of the Year, Best New Product or Service of the Year, Corporate Social Responsibility Program of the Year, and Executive of the Year, among others. PolicyMap won in the Best Web Software…
  • See Round I Promise Zones on PolicyMap

    Morgan Robinson
    10 Oct 2014 | 7:15 am
    We recently added areas designated as federal Promise Zones to PolicyMap. What is a Promise Zone? These areas are the first five of 20 total communities to be designated through 2015 by the Obama administration: Choctaw Nation of Oklahoma Kentucky Highlands Los Angeles (Hollywood, East Hollywood, Koreatown, Pico Union and Westlake neighborhoods) San Antonio (EastPoint neighborhood) Philadelphia (Mantua neighborhood) Designation as a Promise Zone does not entail any additional federal grants or funding; instead, HUD, USDA, HHS, DOJ, SBA, and other federal agencies will help local government…
  • PolicyMap Geocoder: Now Even More Gooder!

    Bernie Langer
    8 Oct 2014 | 11:34 am
    400 North Street, Harrisburg, PA. It’s a simple address. It’s a state office building. People work there. You can mail a letter there. But for a while, you might have had some trouble finding it on PolicyMap. A couple years ago, we upgraded our geocoder (the process that finds an address on a map) so it was much more flexible in finding addresses typed into the location bar. The new geocoder featured rooftop geocoding: It knew the precise locations of most addresses in the country. It also featured constant updates, spellchecking capabilities, and alternate street names. The old geocoder…
  • Map Vocabulary

    Bernie Langer
    6 Oct 2014 | 12:54 pm
    PolicyMap has two basic types of maps. One has a very simple name, the other is more complicated. This is a point map: Pretty simple: the map shows points representing the locations of, in this case, schools. So what’s this? The data here represents a geographic area, not just a single point. It’s usually an aggregation of a mass within the area (number of people, percent of families, median dollar amount, etc.), where different colors represent a range of values among all areas. There is a technical term for this: choropleth map. (You’ll notice there’s only one “L” in choropleth;…
 
  • add this feed to my.Alltop

    Revolutions

  • Explore R package connections at MRAN

    David Smith
    20 Oct 2014 | 1:25 pm
    Many R scripts depend on CRAN packages, and most CRAN packages in turn depend on other CRAN packages. If you install an R package, you'll also be installing its dependencies to make it work, and possibly other packages as well to enable its full functionality. My colleague Andrie posted some R code to map package dependencies a couple of months ago, but now you can easily explore the dependencies of any CRAN package at MRAN. Simply search for a package and click the Dependencies Graph tab. Here's a very simple one: the foreach package. The foreach package depends on two others:…
  • Because it's Friday: Dogs, Cats and their Diaries

    David Smith
    17 Oct 2014 | 2:52 pm
    It's been a super-busy time at Strata this week, so I'm taking the easy route for Because it's Friday this week: funny dog and cat videos. If you're not one of the 10 million people who have seen Sad Dog Diary, well, now's your chance: And if you're more of a cat person, there's also Sad Cat Diary: That's all for this week! Have a great weekend, and we'll be back on Monday.
  • Statistics doesn't have to be so hard: simulate!

    David Smith
    17 Oct 2014 | 7:17 am
    My second-favourite keynote from yesterday's Strata Hadoop World conference was this one, from Pinterest's John Rauser. To many people (especially in the Big Data world), Statistics is a series of complex equations, but a just a little intuition goes a long way to really understanding data. John illustrates this wonderfully using an example of data collected to determine whether consuming beer causes mosquitoes to bite you more:   The big lesson here, IMO, is that so many statistical problems can seem complex, but you can actually get a lot of insight by recognizing that your data…
  • R User Groups and "after hours" Creativity

    Blog Administrator
    16 Oct 2014 | 5:30 am
    by Joseph Rickert There is something about R user group meetings that both encourages, and nourshies a certain kind of "after hours" creativity. Maybe it is the pressure of having to make a presentation about stuff you do at work interesting to a general audience, or maybe it is just the desire to reach a high level of play. But, R user group presentations often manage to make some obscure area of computational statistics seem to be not only accessible, but also relevant and fun. Here are a couple of examples of what I mean. Recently Xiaocun Sun conducted an Image processing…
  • Introducing Revolution R Open and Revolution R Plus

    David Smith
    15 Oct 2014 | 5:10 am
    For the past 7 years, Revolution Analytics has been the leading provider of R-based software and services to companies around the globe. Today, we're excited to announce a new, enhanced R distribution for everyone: Revolution R Open. Revolution R Open is a downstream distribution of R from the R Foundation for Statistical Computing. It's built on the R 3.1.1 language engine, so it's 100% compatible with any scripts, packages or applications that work with R 3.1.1. It also comes with enhancements to improve your R experience, focused on performance and reproducibility: …
  • add this feed to my.Alltop

    iTrend Blog

  • iTrend analytics may help understand Ebola

    Annie M. Dance
    18 Oct 2014 | 3:28 pm
    iTrend analytics may help understand Ebola The Ebola virus outbreak in West Africa has now claimed more than 4,000 lives. A recent BBC article, Ebola: Can big data analytics help contain its spread? says a growing number of data scientists agree that big data analytics may help to contain the virus. Big data analytics is about bringing together many different data sources and mining them to find patterns. In the digital age, tracking the movement of potentially infected people is a lot easier. iTrend’s innovative software shows real time data. A keyword search of Ebola for the past…
  • 5 new Bitcoin facts that may surprise you

    Michael Alatortsev
    10 Jul 2014 | 10:50 am
    1. Russia had previously declared Bitcoin illegal.  It has just recently softened its stance, and, judging from the prevalence of Russian language tweets in our Bitcoin data sets, the Russians are now all over the cryptocurrency.  Based on volume alone, they are now dominating #bitcoin social media conversations.   2. new cryptocurrencies are continuing to emerge; latest example is Latium – claiming to be the fist and only cryptocurrency network (no mining required). 3. Dogecoin is dead.  Wow, really. 4. Snoop Dogg‘s comment about Bitcoin remains the highest retweeted…
  • sneak preview of iTrend 2.0 #analytics – new UI, new insights

    iTrend LLC
    8 Jul 2014 | 9:26 am
    We are testing the latest version of our social analytics platform. It offers tons of new functionality: multi-language support, with ability to split social data by language global maps, with several different views improved filtering brand-new NLP capabilities (the system can understand what people are talking about) additional ways to combine social with other data sources Plus, it is: super fast more affordable than Salesforce Marketing Cloud, Sysomos, etc more flexible than any leading tool customizable (talk to us about your specific requirements today) If you are interested in…
  • Comprehensive analysis of 273,000 #AmazonCart tweets

    iTrend LLC
    23 May 2014 | 8:43 am
    May 28 2014 update: 273,000 tweets were analyzed. Updated Top Selling items are shown below. Please note: we can only track products being added to cart, we don’t have access to actual checkout transactions (unless people choose to share their purchase on Twitter upon checkout – which some do).  Not all ‘sales’ mentioned below have been taken through checkout process.   Top #AmazonCart sellers, by number of items sold: Top #AmazonCart sellers, by total sales value:   We posted some preliminary data when the new feature went live on May 5 2014.  Two weeks…
  • iTrend Build 1984 Release Notes

    iTrend LLC
    15 May 2014 | 8:14 am
    You may have noticed the new build number at the bottom of iTrend’s login page.  We’ve been implementing a number of enhancements based on feedback from our TechCrunch Disrupt NY 2014 presentations.   What’s new in version 1984: improved algorithm for fetching and displaying ‘Software” clients, you will see more product icons FIXED width display bug in two types of reports: ‘Retweets’ and ‘Tweets’ improved PDF exporting/print capabilities in qualifying subscription plans improved display/refresh UX in ‘Conversations’…
Log in