整理自Bitbucket中机器学习笔记和手抄本
from keras.utils.visualize_util import plot
api upgrade
1 |
|
Merge is a layer. Merge takes layers as input.Merge is usually used with Sequential models,merge is a function.merge takes tensors as input.merge is a wrapper around Merge. merge is used in Functional API. Using Merge:
1 | left = Sequential() |
from keras.engine import merge
-> from keras.layers import merge
- E tensorflow/stream_executor/cuda/cuda_blas.cc:444] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
Probably Out Of Memory, Use
nvidia-smi
check it, alsonvidia-smi -l 1
, and stop that process.
- Save whole model(architecture + weights + optimizer state) or just save weights
1 |
|
1 |
|
- Vairble-Size Image As Input
- Numpy remove scientific notation
np.set_printoptions(suppress=True)
- How get input sequence length by keras?
sequence_length = model.input.shape[1].value
- How get most common value in passed array?
1 |
|
What is val_loss and val_acc? what is different between acc and val_acc
val_loss and val_acc is meaning your model accuracy in valdation datasets
how to tell which keras model is better, Do I use the “acc” (from the training data?) one or the “val acc” (from the validation data?) one?
1 | Model1: |
If you want to estimate the ability of your model to generalize to new data (which is probably what you want to do), then you look at the validation accuracy, because the validation split contains only data that the model never sees during the training and therefor cannot just memorize.
If your training data accuracy (“acc”) keeps improving while your validation data accuracy (“val_acc”) gets worse, you are likely in an overfitting situation, i.e. your model starts to basically just memorize the data.
fit
difference withfit_transform
随机森林等于决策树加Bagging,Bagging原理是什么?
有放回的选出m个大小为n的子集作为信的训练集,在m个训练集上使用分类回归等算法,可得到m个模型,然后通过取平均值,多数票等方法,即可得到Bagging的结果
为什么要剪枝,预剪枝和后剪枝有什么区别
先剪枝,自上向下速度快,后剪枝自底向上准确率高。预剪枝是在每个节点划分前进行估计,若当前节点的划分不能带来决策树的泛化能力提升,则停止划分并将当前节点划分为叶节点。后剪枝则先从训练集生成一棵完整的决策树,然后自底向上的对非叶节点进行考察,若将该节点时对应的子树替换为叶节点不能带来泛化能力的提升,则该子树替换为叶节点。
随机森林的优缺点
优点:
- 高度并行化
- 随机选择决策树节点划分特征,在特征维度很高时,便能高效训练模型。
- 选择特征,树都具有选择特征
- 随机采样,使模型方差小,泛化能力强
- 比Boosting简单
- 对部分特征缺失不反感 然而感觉并不是
缺点:
- 噪音比较大的样本,容易过拟合
- 取值划分比较多的特征容易对RF的决策产生更大的影响,从而影响你和的模型的效果