developer tip

Python-정확히 sklearn.pipeline.Pipeline은 무엇입니까?

copycodes 2020. 9. 14. 21:19
반응형

Python-정확히 sklearn.pipeline.Pipeline은 무엇입니까?


sklearn.pipeline.Pipeline정확히 어떻게 작동하는지 알 수 없습니다 .

문서에 몇 가지 설명이 있습니다 . 예를 들어 다음과 같은 의미가 있습니다.

최종 추정기를 사용한 변환 파이프 라인.

내 질문을 더 명확하게하기 위해 무엇 steps입니까? 어떻게 작동합니까?

편집하다

답변 덕분에 내 질문을 더 명확하게 만들 수 있습니다.

파이프 라인을 호출하고 단계적으로 두 개의 변환기와 하나의 추정기를 전달할 때, 예 :

pipln = Pipeline([("trsfm1",transformer_1),
                  ("trsfm2",transformer_2),
                  ("estmtr",estimator)])

전화하면 어떻게 되나요?

pipln.fit()
OR
pipln.fit_transform()

추정기가 변압기가 될 수있는 방법과 변압기가 장착되는 방법을 알 수 없습니다.


scikit-learn의 Transformer -fit 및 transform 메소드 또는 fit_transform 메소드가있는 일부 클래스.

Predictor -fit 및 predict 메서드가있는 일부 클래스 또는 fit_predict 메서드.

파이프 라인 은 추상적 인 개념 일 뿐이며 기존 ML 알고리즘이 아닙니다. 종종 ML 작업에서 최종 추정기를 적용하기 전에 원시 데이터 세트의 다양한 변환 (기능 집합 찾기, 새 기능 생성, 일부 좋은 기능 선택)의 시퀀스를 수행해야합니다.

다음 은 파이프 라인 사용의 좋은 예입니다. Pipeline은 3 단계의 변환 및 결과 추정기에 대한 단일 인터페이스를 제공합니다. 내부에 변환기와 예측자를 캡슐화하며 이제 다음과 같은 작업을 수행 할 수 있습니다.

    vect = CountVectorizer()
    tfidf = TfidfTransformer()
    clf = SGDClassifier()

    vX = vect.fit_transform(Xtrain)
    tfidfX = tfidf.fit_transform(vX)
    predicted = clf.fit_predict(tfidfX)

    # Now evaluate all steps on test set
    vX = vect.fit_transform(Xtest)
    tfidfX = tfidf.fit_transform(vX)
    predicted = clf.fit_predict(tfidfX)

그냥 :

pipeline = Pipeline([
    ('vect', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
    ('clf', SGDClassifier()),
])
predicted = pipeline.fit(Xtrain).predict(Xtrain)
# Now evaluate all steps on test set
predicted = pipeline.predict(Xtest)

With pipelines you can easily perform a grid-search over set of parameters for each step of this meta-estimator. As described in the link above. All steps except last one must be transforms, last step can be transformer or predictor. Answer to edit: When you call pipln.fit() - each transformer inside pipeline will be fitted on outputs of previous transformer (First transformer is learned on raw dataset). Last estimator may be transformer or predictor, you can call fit_transform() on pipeline only if your last estimator is transformer (that implements fit_transform, or transform and fit methods separately), you can call fit_predict() or predict() on pipeline only if your last estimator is predictor. So you just can't call fit_transform or transform on pipeline, last step of which is predictor.


I think that M0rkHaV has the right idea. Scikit-learn's pipeline class is a useful tool for encapsulating multiple different transformers alongside an estimator into one object, so that you only have to call your important methods once (fit(), predict(), etc). Let's break down the two major components:

  1. Transformers are classes that implement both fit() and transform(). You might be familiar with some of the sklearn preprocessing tools, like TfidfVectorizer and Binarizer. If you look at the docs for these preprocessing tools, you'll see that they implement both of these methods. What I find pretty cool is that some estimators can also be used as transformation steps, e.g. LinearSVC!

  2. Estimators are classes that implement both fit() and predict(). You'll find that many of the classifiers and regression models implement both these methods, and as such you can readily test many different models. It is possible to use another transformer as the final estimator (i.e., it doesn't necessarily implement predict(), but definitely implements fit()). All this means is that you wouldn't be able to call predict().

As for your edit: let's go through a text-based example. Using LabelBinarizer, we want to turn a list of labels into a list of binary values.

bin = LabelBinarizer()  #first we initialize

vec = ['cat', 'dog', 'dog', 'dog'] #we have our label list we want binarized

Now, when the binarizer is fitted on some data, it will have a structure called classes_ that contains the unique classes that the transformer 'knows' about. Without calling fit() the binarizer has no idea what the data looks like, so calling transform() wouldn't make any sense. This is true if you print out the list of classes before trying to fit the data.

print bin.classes_  

I get the following error when trying this:

AttributeError: 'LabelBinarizer' object has no attribute 'classes_'

But when you fit the binarizer on the vec list:

bin.fit(vec)

and try again

print bin.classes_

I get the following:

['cat' 'dog']


print bin.transform(vec)

And now, after calling transform on the vec object, we get the following:

[[0]
 [1]
 [1]
 [1]]

As for estimators being used as transformers, let us use the DecisionTree classifier as an example of a feature-extractor. Decision Trees are great for a lot of reasons, but for our purposes, what's important is that they have the ability to rank features that the tree found useful for predicting. When you call transform() on a Decision Tree, it will take your input data and find what it thinks are the most important features. So you can think of it transforming your data matrix (n rows by m columns) into a smaller matrix (n rows by k columns), where the k columns are the k most important features that the Decision Tree found.

참고URL : https://stackoverflow.com/questions/33091376/python-what-is-exactly-sklearn-pipeline-pipeline

반응형