Data Mining

  • Most Topular Stories

  • Teradata Stands Alone, and Not Alone

    Data mining News
    19 Apr 2014 | 7:03 pm
    … three keys: Data integration and data mining Leveraging SAS analytics and BI … modules: Interaction Advisor, which uses data mining to score the next best …
  • Microsoft launches toolset for capturing 'ambient intelligence'

    Computerworld BI and Analytics News
    16 Apr 2014 | 4:29 am
    Microsoft is targeting the growing volume of data being generated by both machines and humans: CEO Satya Nadella on Tuesday showed off tools that could help organizations better understand -- and profit from -- this trove of information.
  • Probability: A Halloween Puzzle

    Data Mining in MATLAB
    12 Apr 2014 | 7:22 pm
    IntroductionThough Halloween is months away, I found the following interesting and thought readers might enjoy examining my solution.Recently, I was given the following probability question to answer:Halloween Probability PuzzlerThe number of trick-or-treaters knocking on my door in any five minute interval between 6 and 8pm on Halloween night is distributed as a Poisson with a mean of 5 (ignoring time effects). The number of pieces of candy taken by each child, in addition to the expected one piece per child, is distributed as a Poisson with a mean of 1. What is the minimum number of pieces…
  • Waaaah! EMNLP six months late :)

    natural language processing blog
    14 Apr 2014 | 9:56 am
    Okay, so I've had this file called emnlp.txt sitting in my home directory since Oct 24 (last modification), and since I want to delete it, I figured I'd post it here first. I know this is super belated, but oh well, if anyone actually reads this blog any more, you're the first to know how I felt 6 months ago. I wonder if I would make the same calls today... :)A Log-Linear Model for Unsupervised Text Normalization (Yi Yang and Jacob Eisenstein)Parsing entire discourses as very long strings: Capturing topic continuity in grounded language learning [TACL] (Minh-Thang Luong, Michael C. Frank,…
  • Attracting A Younger Customer

    Kevin Hillstrom: MineThatData
    Kevin Hillstrom
    17 Apr 2014 | 8:15 pm
    Inside a book store, we see a nice display, don't we?Top Teen Picks.New Teen Fiction.What is missing in this picture?Teens.It is good to offer merchandise designed to attract a younger audience.Then, we have to find a younger audience. It's really hard to do that within a framework designed to attract the loyal, core customer.
 
  • add this feed to my.Alltop

    Data Mining in MATLAB

  • Probability: A Halloween Puzzle

    12 Apr 2014 | 7:22 pm
    IntroductionThough Halloween is months away, I found the following interesting and thought readers might enjoy examining my solution.Recently, I was given the following probability question to answer:Halloween Probability PuzzlerThe number of trick-or-treaters knocking on my door in any five minute interval between 6 and 8pm on Halloween night is distributed as a Poisson with a mean of 5 (ignoring time effects). The number of pieces of candy taken by each child, in addition to the expected one piece per child, is distributed as a Poisson with a mean of 1. What is the minimum number of pieces…
  • Reading lately (Nov-2013)

    23 Nov 2013 | 6:23 pm
    I read a great deal of technical literature and highly recommend the same for anyone interested in developing their skill in this field. While many recent publications have proven worthwhile (2011's update of Data Mining by Witten and Frank; and 2010's Fundamentals of Predictive Text Mining, by Weiss, Indurkhya and Zhang are good examples), I confess being less than overwhelmed by many current offerings in the literature. I will not name names, but the first ten entries returned from my search of books at Amazon for "big data" left me unimpressed. While this field has enjoyed popular…
  • Ranking as a Pre-Processing Tool

    13 Jan 2013 | 10:13 am
    To squeeze the most from data, analysts will often modify raw variables using mathematical transformations.  For example, in Data Analysis and Regression, Tukey described what he terms a "ladder of re-expression" which included a series of low-order roots and powers (and the logarithm) intended to adjust distributions to permit better results using linear regression.  Univariate adjustments using those particular transformations are fairly popular today, and are even incorporated directly into some machine learning software as part of the solution search.  The modified…
  • The Good and the Bad of the Median

    31 Jul 2012 | 9:08 am
    Perhaps the most fundamental statistical summary beyond simple counting or totaling is the mean.  The mean reduces a collection of numbers to a single value, and is one of a number of measures of location.  The mean is by far the most commonly used and widely understood way of averaging data, but it is not the only one, nor is it always the "best" one.  In terms of popularity, the median is a distant second, but it offers a mixture of behaviors which make it an appealing alternative in many circumstances.One important property of the median is that it is not affected- at all-…
  • Linear Discriminant Analysis (LDA)

    11 Dec 2010 | 2:03 am
    OverviewLinear discriminant analysis (LDA) is one of the oldest mechanical classification systems, dating back to statistical pioneer Ronald Fisher, whose original 1936 paper on the subject, The Use of Multiple Measurements in Taxonomic Problems, can be found online (for example, here).The basic idea of LDA is simple: for each class to be identified, calculate a (different) linear function of the attributes. The class function yielding the highest score represents the predicted class.There are many linear classification models, and they differ largely in how the coefficients are established.
 
  • add this feed to my.Alltop

    natural language processing blog

  • Waaaah! EMNLP six months late :)

    14 Apr 2014 | 9:56 am
    Okay, so I've had this file called emnlp.txt sitting in my home directory since Oct 24 (last modification), and since I want to delete it, I figured I'd post it here first. I know this is super belated, but oh well, if anyone actually reads this blog any more, you're the first to know how I felt 6 months ago. I wonder if I would make the same calls today... :)A Log-Linear Model for Unsupervised Text Normalization (Yi Yang and Jacob Eisenstein)Parsing entire discourses as very long strings: Capturing topic continuity in grounded language learning [TACL] (Minh-Thang Luong, Michael C. Frank,…
  • Active learning for positive examples

    16 Sep 2013 | 11:59 am
    I have a colleague who wants to look through large amounts of (text) data for examples of a pretty rare phenomenon (maybe 1% positive class, at most). We have about 20 labeled positive examples and 20 labeled negative examples. The natural thing to do at this point is some sort of active learning.But here's the thing. We have no need for a classifier. And we don't even care about being good at finding negative examples. All we care about is finding as many positive examples from a fixed corpus as possible.That is to say: this is really a find-a-needle-in-a-haystack problem. The best…
  • The *SEM 2013 Panel on Language Understanding (aka semantics)

    8 Jul 2013 | 2:41 pm
    One of the highlights for me at NAACL was the *SEM panel on "Toward Deep NLU", which had the following speakers: Kevin Knight (USC/ISI), Chris Manning (Stanford), Martha Palmer (UC Boulder), Owen Rambow (Columbia) and Dan Roth (UIUC). I want to give a bit of an overview the panel, interspersed with some opinion. I gratefully acknowledge my wonderful colleague Bonnie Dorr for taking great notes (basically a transcript) and sharing them with me to help my failing memory. For what it's worth, this basically seemed like the "here's what I'm doing for DEFT panel" .Here's the basic gist that I got…
  • My NAACL 2013 list...

    17 Jun 2013 | 12:11 pm
    I feel a bit odd doing my "what I liked at NAACL 2013" as one of the program chairs, but not odd enough to skip what seems to be the most popular type of post :). First, though, since Katrin Kirchhoff (my co-chair) and I never got a chance to formally thank Lucy Vanderwende (the general chair) and give her flowers (or wine or...) let me take this opportunity to say that Lucy was an amazing general chair and that working with her made even the least pleasant parts of PCing fun. So: thanks Lucy -- I can't imagine having someone better to have worked with! And all of the rest of you: if you see…
  • What is a sparse difference in probability distributions?

    30 Apr 2013 | 7:36 am
    Sparsity has been all the rage for a couple of years now. The standard notion of "sparse" vector u is that the number of non-zeros in u is small. This is simply the l_0 norm of u, ||u||_0. This norm is well studied, known to be non-convex, and often relaxed to the l_1 norm of u, ||u||_1: the sum of absolute values. (Which has the nice property of being the "tightest" convex approximation to l_0.)In some circumstances, it might not be that most of u is zero, but simply that most of u is some fixed scalar constant a. The "non-constant" norm of u would be something like "the number of components…
  • add this feed to my.Alltop

    Kevin Hillstrom: MineThatData

  • Attracting A Younger Customer

    Kevin Hillstrom
    17 Apr 2014 | 8:15 pm
    Inside a book store, we see a nice display, don't we?Top Teen Picks.New Teen Fiction.What is missing in this picture?Teens.It is good to offer merchandise designed to attract a younger audience.Then, we have to find a younger audience. It's really hard to do that within a framework designed to attract the loyal, core customer.
  • Bed, Bath, and Beyond

    Kevin Hillstrom
    16 Apr 2014 | 8:15 pm
    I needed garbage bags ... not your ordinary garbage bags, but Simple Human garbage bags, size J.Look at the omnichannel magic pasted all over this image. Store integration, sign up for offers, like it on Facebook. I could share the product on Twitter or Pinterest. That's a lot of social / mobile / e-commerce / in-store omnichannel goodness.Look at the price:  $15.99. And if I only buy this item, I have to pay $5.99 standard shipping, $18.99 express shipping, or I have to spend one to two hours to drive to a store and buy the item ... my choice. Omnichannel!Let's go do a search on Amazon…
  • Successful E-Commerce Brands

    Kevin Hillstrom
    15 Apr 2014 | 8:15 pm
    Last month, a catalog advocate left a comment ... this smart person pointed out that there are terribly few e-commerce brands that have achieved "scale" (meaning that the business became very large and dominant).Her point was that e-commerce needs an "assist". Amazon aside, unless the e-commerce brand has retail stores to market the website, or unless the e-commerce brand has catalogs to market the website, e-commerce is unable to achieve critical mass.This opinion, of course, was offered as a defense of maintaining a catalog marketing strategy.Last month, one of our long-time…
  • Hillstrom's Retail Nightmares: Storm Clouds Are Brewing

    Kevin Hillstrom
    14 Apr 2014 | 8:15 pm
    I know, I know, the experts want you to digitize your retail store. By making the store more like a website, customers will happily visit your store instead of using your website?No?Ok, so digitizing the store will cause the store to be integrated with the website, causing customers to use your website more often. The customer will stay at home, enjoying a fully integrated experience on a tablet, leaving the parking lot looking pretty darn empty. Who wants to visit a store when there are no cars in the parking lot?Let's take a look inside the Office Depot store pictured above: Yup -…
  • Amazon: Hologram Marketing

    Kevin Hillstrom
    14 Apr 2014 | 1:00 pm
    I've been talking about Hologram Marketing for eight years. It's coming (click here).Catalogs = Judy.E-Commerce = Jennifer.Mobile = Jasmine.Hologram = Jadyn.
 
  • add this feed to my.Alltop

    Neoformix

  • Markham Winter of 2014

    1 Apr 2014 | 4:30 am
    Winter has finally ended in Markham where I live and it has seemed a very long and cold season this year. I decided to take a look at the weather data from Environment Canada and see whether my impression is supported by the data. The result is the graphic below. Click on it to see a larger version. Yes, 2014 was the coldest winter in Markham since 1994. We had an average temperature during the winter of -8.2 C this year and in 1994 it was -9.2 C. Both last year and especially 2012 were warmer than usual so it likely felt that much worse in comparison. We also had the 4th most snow in the…
  • Toronto Visible Minorities

    27 Sep 2013 | 4:30 am
    Toronto is the most multicultural city in the world. According to the 2011 National Household Survey, 46% of the population were foreign-born immigrants and 47% are members of a visible minority. (ref) These immigrants come from a wide variety of places across the globe and their diversity makes the city a truly remarkable place. I have created a Dot Map that shows a single point for every person in the Toronto area, coloured by visible minority status. There are 5,700,628 in all and they are positioned at their place of residence and coloured based on the information from the 2011 census and…
  • Toronto 311 Visualization

    6 Sep 2013 | 4:20 am
    The calls people make into the 311 service line in Toronto give an interesting glimpse into the pulse of the city. The City of Toronto makes this data available through their Open Data initiative. I did some analysis and design work with it to produce a visualization for illuminating time-based patterns during 2012. The visualization is a set of small multiple calendar heatmaps, one for each data series. The one shown above is for reports about 'long grass and weeds'. I was inspired to use this visual form by this example: Vehicles involved in fatal crashes by Nathan Yau. I experimented with…
  • Visual Book Selector

    8 May 2013 | 5:00 am
    One common pattern I see in many interactive applications is to support a person who is selecting a few items from some larger set. Often these items have various characteristics that the person wants to use in some way to guide their selection process. The characteristics can be numeric quantities, dates, categories, or names of things. Showing all the items in a list and allowing the person to sort by one of the attributes is often a decent default solution. In other cases it's more useful to consider multiple attributes at a time during the selection process. Maybe you want items that are…
  • Star Wars Movie Fingerprints

    27 Mar 2013 | 4:35 am
    Recently YouTube had a video that showed all six Star Wars movies at once. They were placed in a 2 by 3 matrix and had an audio track of all the movies superimposed. It was an interesting experiment that has since been removed based on copyright grounds. Before it was removed I was able to do some simple analysis on the video and extract some details of the individual episodes of the Star Wars series. Basically, I produced something very similar to a classic work called Cinema Redux™ by Brendan Dawes, done in 2004. Each individual movie in the series was reduced to a collection of small…
  • add this feed to my.Alltop

    Trends and Outliers

  • Analytics Maturity, Stage 2: Diagnose: The Root Cause of Business, Operational Conditions

    Spotfire Blog Editor
    17 Apr 2014 | 5:55 am
    In a previous post, we explained how the first stage of the Analytics Maturity Model, “Measure,” enables executives and front-line managers to obtain a quick, current status of the operational and business performance of their company. The second stage of the Analytics Maturity Model, “Diagnose,” is where business leaders are able to visually interact and drill into their data to discover additional answers to questions that arose in the Measure stage, e.g., an increase or decrease in monthly revenue for a particular region. Executives and other decision makers may also diagnose why…
  • Conquering the Key Challenges of Big Data

    Spotfire Blog Editor
    16 Apr 2014 | 5:55 am
    Big data offers companies a number of useful benefits, including opportunities for decision makers to gain deep insights about customers and market opportunities. When used effectively with analytics tools, big data can also help business leaders identify and stem emerging issues (e.g., a developing bottleneck in a company’s supply chain) – even before they’ve reached the surface. Companies Still Need to Address Big Data Challenges  Still, despite the tremendous opportunities that it offers, big data also presents some heady challenges to organizations. These include struggles among…
  • Transform Energy 2014 – The Value of Shared Experiences

    Spotfire Blog Editor
    15 Apr 2014 | 5:55 am
    By Steve Farr, Oil & Gas Industry Expert Everyone in our industry attends functions, conventions, and shows that hint at the future of data analytics. Sure, we go to shows, but let’s be honest: who really remembers every detail of every presentation, 12 months earlier on a particular day? Very few people. But, sometimes you do. And when that happens, it’s because the presentation is insightful and has a deep impact. And that’s just what happened to me at last year’s Transform 2013. Transform Energy: Then & Now My Wow moment happened during the presentation of the MaraDrill…
  • Life Is Data: Improving Public Welfare with Big Data

    Spotfire Blogging Team
    14 Apr 2014 | 5:55 am
    The term “big data” is a bit of a misnomer. That’s because while the size of the data streaming into companies from social networks, online shopping, mobile devices and sensors attached to machinery is massive, the most important aspect of big data is the potential for how it can be applied to solve some of the world’s most vexing problems. That’s the assertion of a new article from Harvard magazine, which notes that the most compelling aspects of big data are its ability to create new knowledge by linking datasets as well as its creative approaches to visualizing data. The article…
  • Wealth Management: Strengthening the Advisor-Client Relationship

    Spotfire Blogging Team
    11 Apr 2014 | 5:55 am
    Despite nearly a decade of negative real returns on equity and a string of bear markets, global wealth has more than doubled since 2000, reaching an all-time high of $241 trillion last year. That’s according to the 2013 Credit Suisse Wealth Report, which forecasts that global wealth is expected to grow 40% over the next five years, reaching $334 trillion by 2018. The vibrant expansion in global affluence is being fueled by a combination of factors: the continuing expansion of emerging markets (which is estimated to account for 29% of the projected growth) along with the rise of…
 
  • add this feed to my.Alltop

    PolicyMap

  • The Changing Face of the United States

    Kristin Crandall
    17 Apr 2014 | 9:31 am
    When it comes to data, some demographic trends are more easily captured than others. The country’s shifting racial and ethnic makeup is perhaps towards the top of this list. The fact that the US is an increasingly multiracial country has been discussed in many forums, such as the Smithsonian, the New York Times, and other news outlets. Last October, National Geographic published an interesting article called “The Changing Face of America” in which the article’s author, Lise Funderburg, and photographer, Martin Schoeller, attempt to put a human face on the country’s increasingly…
  • Use the Data Loader and upload your data today!

    Phil Vu
    17 Apr 2014 | 7:55 am
    PolicyMap’s data loader lets subscribers easily load their own address level files to view on top of any of the over 15,000 indicators available in PolicyMap. Choose to keep your data private, share it confidentially within your organization or post it for the public to access. Watch our video to see just how easy it is to use! But, don’t take our word for it. Try it out for yourself with FREE 7-day trial! Visit our support page! Want to learn more about the many features available to you on PolicyMap? Go to the Support Page to find; the calendar of available training sessions free for…
  • Take a Virtual Historic Tax Credit Road Trip

    Morgan Robinson
    11 Apr 2014 | 1:05 pm
    The Historic Tax Credit program brings history to life, providing a 20% tax credit for the restoration of a certified historic structure that complies with rehabilitation guidelines. Good news for those of us who are both historic preservation nerds and PolicyMap users: historic tax credit sites have recently been updated to include projects approved during the 2013 fiscal year, making it easy to plan your next road trip from the comfort of your home or office. If you travel along Route 66, for example, you’ll pass through a great deal of history. Although many iconic Route 66 buildings…
  • PolicyMap attends Unveiling of Open Data 500

    Katie Nelson
    8 Apr 2014 | 12:19 pm
    PolicyMap attended a panel discussion on The Economic Impact of Open Data today, hosted by the Center for Data Innovation in Washington D.C.  Speakers focused on the opportunities and challenges associated with making government data more accessible and useful, and the potential gain to the private sector in leveraging data resources from federal, state and local government. GovLab, the Governance Lab at NYU, unveiled http://www.opendata500.com/ a website which allows you to see what kinds of companies leverage data from each agency of the federal government. As an organization dedicated to…
  • PolicyMap Infographic and April 2014 Data Updates

    Maggie McCullough
    4 Apr 2014 | 7:24 am
    Infographic from PolicyMap Shows Impact of Census Tract Boundary Changes Drive into any US town and you’ll see the famous population sign. Whether it’s a metropolitan city with a population in the millions or a small farming community populated with more animals than people, the numbers we see on those signs are merely a suggestion. Census boundaries change all the time — people move in and out of areas, businesses close or open, redevelopment takes place — and these changes make it nearly impossible to accurately compare a region over time. To make this easier, we…
  • add this feed to my.Alltop

    Revolutions

  • Because it's Friday: This is why dogs hate wizards

    David Smith
    18 Apr 2014 | 1:48 pm
    What happens when you offer a dog a treat, but then make it vanish via sleight of hand? This: Like Sullivan, I'm surprised these dogs are fooled at all, and can't tell where the treat is by scent. That's all for this week. See you on Monday!
  • R and the weather in the local news

    David Smith
    18 Apr 2014 | 10:13 am
    The Mountain View Voice is a weekly newspaper serving the Silicon Valley area, and is a familiar sight to anyone wandering the streets of Palo Alto or Menlo Park. Angela Hey writes for 'Hey Tech!', an online blog of the Voice, and has just published a feature on R and the local Bay Area User Group (BARUG). It includes a nice history of R, and an in-depth recap of Ram Narasimhan's lightning talk on the weatherData package and his weatherCompare app at the last BARUG meeting. (You can read about other talks at that BARUG meetup in Joe Rickert's recap.) Read Angela…
  • DM Radio on Data Science

    David Smith
    18 Apr 2014 | 9:31 am
    A couple of weeks ago, I participated in a panel discussion for DM Radio: "Still Sexy? How's that Data Scientist Gig Working Out?". The title was provocative, but the discussion mostly revolved around the rise of data science and how advanced analytics (often implemented with R) is changing the way many companies do business today. Also on the panel hosted by Eric Kavanagh were Geoffrey Malafsky of Phasic Systems, John Whittaker of Dell, Chandran Saravana of SAP. The podcast is now available online, which you can listen to at the link below. (And the answer is: Yes, still…
  • Diving into H2O

    Joseph Rickert
    17 Apr 2014 | 8:30 am
    by Joseph Rickert One of the remarkable features of the R language is its adaptability. Motivated by R’s popularity and helped by R’s expressive power and transparency developers working on other platforms display what looks like inexhaustible creativity in providing seamless interfaces to software that complements R’s strengths. The H2O R package that connects to 0xdata’s H2O software (Apache 2.0 License) is an example of this kind of creativity. According to the 0xdata website, H2O is “The Open Source In-Memory, Prediction Engine for Big Data Science”. Indeed, H2O offers an…
  • Why writing vectorized code in R is a good idea

    David Smith
    16 Apr 2014 | 2:20 pm
    As a language for statistical computing, R has always had a bias towards linear algebra, and is optimized for operations dealing in complete vectors and matrixes. This can be surprising to programmers coming to R from lower-level languages, where iterative programming (looping over the elements of a vector or matrix) is more natural and often more efficient. That's not the case with R, though: Noam Ross explains why vectorized programming in R is a good idea:    If you can express what you want to do in R in a line or two, with just a few function calls that are actually calling…
Log in