This article reproduced from DMFighter's blog:
IDMer: This post contains the common problem of data mining for beginners, DMFighter back on some of the issues I've been finishing well in this also for his hard work. Because the access Many readers of my blog will be repeated mention of these issues, so I have reproduced DMFighter over the finishing post, please first read the beginners who hope to answer some of your doubts. Of course, my only personal indirect reply, inevitably there be biased, but also welcome to explore, given your comments and suggestions.
study and research of data mining guidance and advice to some of
All content summary of some of the blog from the Internet, for their suggestions, in the This summary of some of the better I look, share
also recommend two good DM blog:
1. Data Miner:
2. Data Mining youth:
1 on the paper how Innovation
1 data miners engaged in the work content
2 Ph.D. study the need for further study when self-direction
3
I ask you: data mining is now the mainstream of development platforms and programming languages. those who are more advantages it? what it is now mostly
Question 1: Now select the data mining as a lifelong career wise up?
Question 2: If I choose the data mining, how should I do?
Question 3: how to make the learning phase and to move closer to industrial applications?
effects of different algorithms to assess the performance comparison, summed up the characteristics of various algorithms and application. Of course, if we can do some useful improvements to the algorithm better, you can make paper academic stronger. Finally, describe the results of correlation analysis applied to what benefits (such as the adjustment of shelf space, cross-sell to customers wait).
a successful data mining to data mining in a graduate student recommendations
on data mining research, I had also gone through a number of detours. In fact, from the origins of data mining can be found, it is not a new science, but a combination of statistical analysis, machine learning, artificial intelligence , database, and many other aspects of the research results from the same time, and expert systems, knowledge management, research different is that data mining is more focused on the application level.
Therefore, the data mining combines a lot of content, trying to fully understand all the details will take a long time. So I suggest your first step is to know about three months to several commonly used data mining techniques: classification, clustering, prediction, association analysis, outlier analysis and so on. This understanding is relatively thick, the goal is to understand what these techniques are used, the typical algorithm is roughly how and under what circumstances should choose what kind of techniques and algorithms.
After a preliminary understanding of After the stage to enter topic, select a specific direction of interest, and then read the direction of the classic paper (review, the main direction of development, application of the results). topic may take a longer period of time, such as year. At this point, to gradually clear break point, which is the future innovation of your paper. Innovation is very important for research, on the one hand the innovative approach does better than the original, on the other hand does have the practical value of innovation .
then, it is necessary to realize their ideas. usually the master thesis, the need to build a prototype system to test and use test results to support their thesis. prototype system is the realization of their innovation needs good design and development. It should be noted that the developer of the prototype systems, and the use of systems, the need to better reflect the theoretical basis. In other words, the prototype system is not simply used to implement the function, but your set theory into practice. This theory will also be included in your paper, to reflect the level of theory papers.
build prototype systems and produce convincing results, a process that usually takes at least a year time. so to focus on the core part (the part of the innovation embodied paper), the external interface, and so should not put too much energy, so as not to progress out of control.
finally finishing the paper and writing. suggest that you gradually in the first stage before a number of shorter papers to write (for fat in the journal, meetings), such as review, system framework, the algorithm kernel, applications and so on. so that when the final written thesis will have adequate more content will be written faster and better.
These are just generalities. Actually, I think the key point is that topic, and topic is good or bad depends on your understanding of the status of data mining, Your interest and expertise, and the direction of the significance of the application. suggest that you and mentor, peer more exchanges, can make their own direction more clearly.
As the field of data mining employment prospects should be good, but still. If you are interested in the study, like Microsoft Research, Google, university research is a good place; If you are interested in practical applications, many large companies, including IBM, Accenture, AsiaInfo, and so have the manpower needs of the corresponding Of course, some of the units such as the Party securities, insurance, financial, etc. also need to analyze the personnel unit.
on paper how to innovate
, a paper score consists of several elements, namely, whether a solid theoretical basis, research questions the importance of such innovative research.
the so-called innovation, that is, if you solve the problems and solutions others have different, and whether that difference is more conducive to solving the problem. So, to that innovation, in fact, the need for adequate preparation and in-depth study.
① fully prepared: If innovation is to find a better solution to the problem method, first you have to find problems, but also to find valuable questions. When you find this issue, there is no one to go to find a solution to this problem, what is their approach, there is less than what place.
this stage of the investigation needs to be done to collect a lot of work, also in the preparatory phase of the study, and often need to read the relevant research fields and the latest progress of the classic literature, writing study notes to be concluded.
② in-depth study : When you find a problem worthy of study, and know that this issue is no good solution, you will have opportunities for innovation. to find existing solutions to the deficiencies, the solution put forward their own ideas, and be verified by test or prove your reasoning is valid, innovation will arise. easier said than done, as a new invention, like brewing, often requires a lot of tests and careful thinking, but also may be busy for a long time and nothing.
want to say there is no fear of your heart is, but I've seen some serious research to do a lot of people do invest time and energy to have the achievements. of course for domestic students , I would think that at the postgraduate level to pursue a significant innovation is the practical (a personal opinion), in fact, the first step to doing has been pretty good. If you do not make the first step, lay the foundation to eager to find some innovation, these so-called innovation points are often meaningless to write out the final papers can not avoid the fate of being thrown into the trash (I've written some of the papers, too). We often say that the domestic research poor quality paper, is also largely due to our present education system, and for graduate school before the SCI, EI or core number of papers published in journals, rather than the weight of heavy, creating a completely changed now the core journals completion of the task has become private plots graduation.
topic digress, back to your concern - how to find innovation. the necessary documentation is essential reading, and understanding of research status and background, it may find innovation. If Do you want faster, then this stage, but some shortcuts, such as the number of units you can go to the site or academics, to see their current research trends, in general, they are often content of the study has not yet been resolved, so that you as soon as possible to find the main direction of innovation.
1 data miners engaged in the work content
data mining is the development platform for other business units tailor-made DM, DW system? addition to what can?
2 impress others need further study
unsupervised self-study is only the state, whether in the master's graduates qualified for data mining work? obtain a doctoral schools the need for further study is much?
3 self-direction when
text, Web, etc., Ph.D. study is sure to in-depth study on a specific direction, it should also focus on one direction of their focus, not only familiar with all aspects of general? The following is the blog owner
replies:
1. data miners engaged in similar work and you said, I know some of my friends are mostly in IT companies, for the Party to implement DM, DW and BI projects; Some analysts are doing in the Party, the use of the available data mining knowledge to solve some business problems.
2. for the content of the above work, I feel competent enough to master, of course, not the most important places , but to use the knowledge you learn to solve problems. If you want to further develop the theory, Ph.D. study is also good, but the direction will be different.
3. at the postgraduate level, I think it is a comprehensive would be better understood. Of course, since the contents of data mining involves more, or should be emphasized in some areas, such as those widely used in the algorithm and its applications, including decision trees, clustering, regression, neural networks and so on. so even if you do not enroll in a Ph.D. later, would also be helpful in getting a job.
I ask you: data mining is now the mainstream of development platforms and programming languages. those who are more advantage? what it is now mostly thanks
The following is the blog owner's response:
now generally should be more developed with Java, as far as I know, SAS Enterprise Miner client is Java development, open source data mining tool Weka is developed using Java. Of course, the back part of the server software may be a C developed primarily for performance reasons.
If you are interested in data mining platform, I suggest you go to open-source Web site SourceForge (
advice : 1 for the Party to do data mining for the industry, if by SAS and other common software, how can the industry is different from the characteristics of the different requirements of departments? as he established the process of prediction models will feature the plug-in SAS-style extension, or do enough in itself to establish a specific SAS model?
that the so-called tailor-made, that is, use the same software build different models, the software development process does not involve the transformation?
2 is Data mining does not have to be a programmer? daily work has been rarely associated with programming?
following blog owner's response:
1.SAS the Enterprise Miner is a general-purpose data mining software, so to meet customer specific analysis needs, often take two ways: First, do the project, through the development and implementation of the project team to build customer applications required; the other is the use of SAS solutions for the industry (which is by many years of SAS industry needs and solutions summary, the formation of a complete solution), and then customized.
most cases, do not need to extend the functionality of the SAS, but you can use SAS software to solve business problems. So we usually do not software development project, but project implementation.
2. If you simply create mining models from the standpoint of a programmer miners did not, as long as the application of appropriate mining algorithm based on analysis of data modeling, and model tuning on it. But in fact, not only those miners, more often is to do data preparation and data exploration, which may need to be programmed, of course, these procedures are usually used for data treatment, so the release of model results.
to my personal experience and understanding, data preparation often accounts for a data mining project workload of 60% to 70%.
the following from:
Zhang predecessors : Hello!
my life, Ocean University of China, a research and professional is the database. I want to choose a direction and serious study, as a lifelong career. For example, java programmers, data mining, database management, etc. and so on. I like data mining, but a number of problems which are difficult and confusion.
Question 1: Now select the data mining as a lifelong career wise up?
now online discussion on the prospects of data mining is very powerful, appraise mixed. but at least it is certain that data mining is increasingly being taken seriously. As you said, we see only the surface, many successful cases companies do not publicly available. But it is undeniable, data mining in China The application of Bravo's technology the suspects, the majority of enterprises do not pay attention to it.
In short, men are afraid to vote the wrong line, the question always struck me. If I were your brother, you will encourage me to go Data mining of the way? or recommend other IT professional?
IDMer:
to my personal point of view, the development of data mining, or future is very bright. Frankly speaking, data mining is a means of analyzing the problem, issue has been there, the means to solve the problem there also have been necessary. Perhaps you have heard in the Early American West, when the gold rush, prospectors are not rich, but to provide tools for the gold, water, or even because the miners need to wear sturdy clothes, so chill out in jeans and enduring.
As you mentioned, The enigmatic feeling. In fact, data mining itself is not new technology, which combines statistics from the database and machine learning and other subjects ripe for the content of a crown on the name of it looks stylish.
the composition of these three pillars of data mining disciplines have been developed for many years, has been widely used. then we have reason to believe that their integration can help us solve the problem for more analysis. Moreover, the industry still Many success stories, reflecting the advantages of data mining brought a unique, which is the traditional BI (reporting, OLAP, etc.) can not support.
said a lot more good things data mining, the following look at the coin the other side. If you are my brother (well, although I have not, but also and talked to many young careers Shidishimei topic), I would recommend you do not do IT, ha ha, a half truth is a joke . because in the IT industry still looks quite difficult, and in many projects, often no need to repeat the technical content of some of the tasks, the energy consumption of a lot, does not have much sense of achievement gained.
Anyway, my suggested, in fact, is what I had to their motto: think of to do it. find their own interest, and feel but also the development, try to go ahead.
Question 2: If I select the data mining, how should I do?
you have a blog reply to a Beijing University of Posts and Telecommunications students learn a variety of algorithms familiar proposals; modeling, innovative ideas writing papers improved algorithm.
my current plan is to learn a variety of algorithms principle; learn java language; of weka source, in-depth understanding of the steps several classical algorithms; learn about ETL, data warehousing, OLAP, etc.; through the use of the data set to establish mining model; thinking papers; time, then have a SPSS or other to use popular software. which I think difficult is the collation of data set to form mining process input.
of which, I have some doubts: to master a database is very important, but no time to understand the SQL Server one by one , Oracle, DB2, etc., but in employment, but also what might be the door requests for the database!? I want to separate in-depth SQL Server learning, including learning in the SQL Server data warehouse and data mining applications to establish other The will not care. I do not know my choice of a desirable intention to abandon other undesirable?
IDMer:
plan from your point of view, or on their own knowledge and skills to master, to draw a range of . looking good, and I'm just from personal experience, I suggest you do not have to take himself to be versatile good at all, just a basic understanding of the many ways to select several points to excel their own unique skills. a person's energy is limited , the target harder to achieve the greater.
The choice of those as a priority, we need a broad understanding of the basis, combined with their own interests were screened.
on the master database, I feel familiar with a product that has been enough. SQL Server, Oracle, DB2 is a relational database, relational database in the student stage to lay a solid theoretical basis, skilled use of SQL statements on it. The main difference between the database, not to follow different standards, but the product characteristics are different, and their skills in performance tuning.
Question 3: How to make the learning phase and to move closer to industrial applications?
A fellow students, suggested that I learn new skills and industry closer to the time, otherwise, ; not the technology industry background will be floating. also narrowed. This is conducive to learning focused, and will not in the study, the exhaustive, not prominent.
However, data mining, have to ask you, the nature of data mining staff. One is Party analysts do, the use of the available data mining knowledge to solve some business problems. One is the IT company, for the Party to implement DM, DW and BI projects (his predecessors, will fall into this category?).
Here, I do not understand as a Party, exactly what day? Is it similar to the network properties? do not tend to database administrators? they still are professional data miners do, how to think the company is not the General Assembly arrangements for such positions do?
I prefer to do B, as seems more professional like. However, the B case, said data mining in finance, telecommunications, banking or sales and so the main application. Does this mean was to learn or understand the background of the financial telecommunications, CRM, economics, Excelhh?
there, you had mentioned that you can do after graduation research work, I think, after all, to provide research positions is small. Moreover, that money more than you (sweat in)?
are now into the future which should be considered (type of) company, is now required to strengthen the basis of its own it?
IDMer:
you, the seniors say was right, pure theory divorced from reality, most of them will dissipate a trace. As for the difference between a B, actually not really that big, especially for the junior staff just to work, may be less.
first B experience that a few years, and then skip Party, I see a lot of this situation. perhaps because B accepted in the training and learning opportunities to new knowledge and experience some of it more, you can have more young accumulation. Of course, if there is a good opportunity to work in the Party, is also a good choice.
to B work, especially on the campus stayed fresh students, the recruiters are mostly focused inspection your knowledge is a solid, character conducive to the integration into the team and so on. As for the understanding of the industry, rarely have too many expectations, unless you have years of experience in related industries to do the project.
domestic research institutions research institutes and universities or in the main, the treatment generally lower than the company, but there are a lot of people, including my brothers and sisters and the students chose to continue to do research, because they will get a lot of fun from the study. In addition a number of research institutions is the Institute of enterprises, especially foreign companies, the treatment is also very good, but to ask you a very good chance to join. to do research there is a good place, that is, or research institutions abroad.
Postscript: think my question does not make sense, as if some obvious problems, or is it more like not worth to answer. predecessors if it time, but also look pointing or two. I'm in no hurry, if busy, Shashi Hou can reply.
Thanks!
DMman
No comments:
Post a Comment