【FATE联邦学习】自定义数据集自定义神经网络模型下的横向纵向训练
创始人
2024-05-31 06:30:24
0

前言

代码大部分来自

  • https://fate.readthedocs.io/en/latest/tutorial/pipeline/nn_tutorial/Hetero-NN-Customize-Dataset/#example-implement-a-simple-image-dataset
  • https://fate.readthedocs.io/en/latest/tutorial/pipeline/nn_tutorial/Homo-NN-Customize-your-Dataset/

但是官方的文档不完整,我自己记录完整一下。

我用的是mnist数据集https://webank-ai-1251170195.cos.ap-guangzhou.myqcloud.com/fate/examples/data/mnist.zip。
目录结构如下,横向的话,加载两次mnist就可以,而纵向一方加载mnist_guest带标签,一方加载mnist_host没有标签。mnist12两个文件夹没有用,不用管。
在这里插入图片描述

由于官方demo中的需要使用jupyter,不适合普通Python代码,本文给出此例子。在Python的解释器上,要注意在环境变量里加入FATE的安装包里的bin/init_env.sh里面的Python解释器路径,否则federatedml库会找不到。

横向

自定义数据集

自定义数据集,然后再本地测试一下。

import os
from torchvision.datasets import ImageFolder
from torchvision import transforms
from federatedml.nn.dataset.base import Datasetclass MNISTDataset(Dataset):def __init__(self, flatten_feature=False): # flatten feature or not super(MNISTDataset, self).__init__()self.image_folder = Noneself.ids = Noneself.flatten_feature = flatten_featuredef load(self, path):  # read data from path, and set sample ids# read using ImageFolderself.image_folder = ImageFolder(root=path, transform=transforms.Compose([transforms.ToTensor()]))# filename as the image idids = []for image_name in self.image_folder.imgs:ids.append(image_name[0].split('/')[-1].replace('.jpg', ''))self.ids = idsreturn selfdef get_sample_ids(self):  # implement the get sample id interface, simply return idsreturn self.idsdef __len__(self,):  # return the length of the datasetreturn len(self.image_folder)def __getitem__(self, idx): # get itemret = self.image_folder[idx]if self.flatten_feature:img = ret[0][0].flatten() # return flatten tensor 784-dimreturn img, ret[1] # return tensor and labelelse:return retds = MNISTDataset(flatten_feature=True)
ds.load('mnist/')
# print(len(ds))
# print(ds[0])
# print(ds.get_sample_ids()[0])

成功输出后,要手动在FAET/federatedml.nn.datasets下新建数据集文件,把上文的代码扩充成组件类的形式,如下

import torch
from federatedml.nn.dataset.base import Dataset
from torchvision.datasets import ImageFolder
from torchvision import transforms
import numpy as np
# 这里的包缺什么补什么哈class MNISTDataset(Dataset):def __init__(self, flatten_feature=False): # flatten feature or not super(MNISTDataset, self).__init__()self.image_folder = Noneself.ids = Noneself.flatten_feature = flatten_featuredef load(self, path):  # read data from path, and set sample ids# read using ImageFolderself.image_folder = ImageFolder(root=path, transform=transforms.Compose([transforms.ToTensor()]))# filename as the image idids = []for image_name in self.image_folder.imgs:ids.append(image_name[0].split('/')[-1].replace('.jpg', ''))self.ids = idsreturn selfdef get_sample_ids(self):  # implement the get sample id interface, simply return idsreturn self.idsdef __len__(self,):  # return the length of the datasetreturn len(self.image_folder)def __getitem__(self, idx): # get itemret = self.image_folder[idx]if self.flatten_feature:img = ret[0][0].flatten() # return flatten tensor 784-dimreturn img, ret[1] # return tensor and labelelse:return retif __name__ == '__main__':pass

这样就完成了他官方文档所谓的“手动添加”了。添加后federatedml的目录应该是这样的在这里插入图片描述文件名称要和下文的dataset param对应
添加后,FATE就“认识”我们自建的数据集了。
下文中的local test是不需要做手动添加的步骤的,但是local只是做个测试。生产中没什么用……

横向训练

import os
from torchvision.datasets import ImageFolder
from torchvision import transforms
from federatedml.nn.dataset.base import Dataset# test local process
# from federatedml.nn.homo.trainer.fedavg_trainer import FedAVGTrainer
# trainer = FedAVGTrainer(epochs=3, batch_size=256, shuffle=True, data_loader_worker=8, pin_memory=False) # set parameter# trainer.local_mode() # import torch as t
# from pipeline import fate_torch_hook
# fate_torch_hook(t)
# # our simple classification model:
# model = t.nn.Sequential(
#     t.nn.Linear(784, 32),
#     t.nn.ReLU(),
#     t.nn.Linear(32, 10),
#     t.nn.Softmax(dim=1)
# )# trainer.set_model(model) # set model# optimizer = t.optim.Adam(model.parameters(), lr=0.01)  # optimizer
# loss = t.nn.CrossEntropyLoss()  # loss function
# trainer.train(train_set=ds, optimizer=optimizer, loss=loss)  # use dataset we just developed# 必须在federatedml.nn.datasets目录下  手动加入新的数据集的信息!https://blog.csdn.net/Yonggie/article/details/129404212
# real training
import torch as t
from torch import nn
from pipeline import fate_torch_hook
from pipeline.component import HomoNN
from pipeline.backend.pipeline import PipeLine
from pipeline.component import Reader, Evaluation, DataTransform
from pipeline.interface import Data, Modelt = fate_torch_hook(t)import os
# bind data path to name & namespace
fate_project_path = os.path.abspath('./')
host = 1
guest = 2
arbiter = 3
pipeline = PipeLine().set_initiator(role='guest', party_id=guest).set_roles(guest=guest, host=host,arbiter=arbiter)data_0 = {"name": "mnist_guest", "namespace": "experiment"}
data_1 = {"name": "mnist_host", "namespace": "experiment"}data_path_0 = fate_project_path + '/mnist'
data_path_1 = fate_project_path + '/mnist'
pipeline.bind_table(name=data_0['name'], namespace=data_0['namespace'], path=data_path_0)
pipeline.bind_table(name=data_1['name'], namespace=data_1['namespace'], path=data_path_1)reader_0 = Reader(name="reader_0")
reader_0.get_party_instance(role='guest', party_id=guest).component_param(table=data_0)
reader_0.get_party_instance(role='host', party_id=host).component_param(table=data_1)from pipeline.component.nn import DatasetParamdataset_param = DatasetParam(dataset_name='mnist_dataset', flatten_feature=True)  # specify dataset, and its init parametersfrom pipeline.component.homo_nn import TrainerParam  # Interface# our simple classification model:
model = t.nn.Sequential(t.nn.Linear(784, 32),t.nn.ReLU(),t.nn.Linear(32, 10),t.nn.Softmax(dim=1)
)nn_component = HomoNN(name='nn_0',model=model, # modelloss=t.nn.CrossEntropyLoss(),  # lossoptimizer=t.optim.Adam(model.parameters(), lr=0.01), # optimizerdataset=dataset_param,  # datasettrainer=TrainerParam(trainer_name='fedavg_trainer', epochs=2, batch_size=1024, validation_freqs=1),torch_seed=100 # random seed)pipeline.add_component(reader_0)
pipeline.add_component(nn_component, data=Data(train_data=reader_0.output.data))
pipeline.add_component(Evaluation(name='eval_0', eval_type='multi'), data=Data(data=nn_component.output.data))pipeline.compile()
pipeline.fit()# print result and summary
pipeline.get_component('nn_0').get_output_data()
pipeline.get_component('nn_0').get_summary()

纵向

会用到mnist_host和mnist guest,下载

guest data: https://webank-ai-1251170195.cos.ap-guangzhou.myqcloud.com/fate/examples/data/mnist_guest.zip

host data: https://webank-ai-1251170195.cos.ap-guangzhou.myqcloud.com/fate/examples/data/mnist_host.zip

你查看一下数据集的格式。FATE里面的纵向,都是一方有标签,一方没标签,跟我所认知的合并数据集那种场景有差别。

纵向数据集

做法参考横向那里,我这里只给出新建的类的代码,跟横向的有一点点差别。

import torch
from federatedml.nn.dataset.base import Dataset
from torchvision.datasets import ImageFolder
from torchvision import transforms
import numpy as npclass MNISTDataset(Dataset):def __init__(self, return_label=True):  super(MNISTDataset, self).__init__() self.return_label = return_labelself.image_folder = Noneself.ids = Nonedef load(self, path):  self.image_folder = ImageFolder(root=path, transform=transforms.Compose([transforms.ToTensor()]))ids = []for image_name in self.image_folder.imgs:ids.append(image_name[0].split('/')[-1].replace('.jpg', ''))self.ids = idsreturn selfdef get_sample_ids(self, ):return self.idsdef get_classes(self, ):return np.unique(self.image_folder.targets).tolist()def __len__(self,):  return len(self.image_folder)def __getitem__(self, idx): # get item ret = self.image_folder[idx]img = ret[0][0].flatten() # flatten tensor 784 dimsif self.return_label:return img, ret[1] # img & labelelse:return img # no label, for hostif __name__ == '__main__':pass

在这里插入图片描述

纵向训练

详细的注释都放在里面了。

import numpy as np
from federatedml.nn.dataset.base import Dataset
from torchvision.datasets import ImageFolder
from torchvision import transforms# 本地定义的
# class MNISTHetero(Dataset):#     def __init__(self, return_label=True):  
#         super(MNISTHetero, self).__init__() 
#         self.return_label = return_label
#         self.image_folder = None
#         self.ids = None#     def load(self, path):  #         self.image_folder = ImageFolder(root=path, transform=transforms.Compose([transforms.ToTensor()]))
#         ids = []
#         for image_name in self.image_folder.imgs:
#             ids.append(image_name[0].split('/')[-1].replace('.jpg', ''))
#         self.ids = ids#         return self#     def get_sample_ids(self, ):
#         return self.ids#     def get_classes(self, ):
#         return np.unique(self.image_folder.targets).tolist()#     def __len__(self,):  
#         return len(self.image_folder)#     def __getitem__(self, idx): # get item 
#         ret = self.image_folder[idx]
#         img = ret[0][0].flatten() # flatten tensor 784 dims
#         if self.return_label:
#             return img, ret[1] # img & label
#         else:
#             return img # no label, for host# test guest dataset
# ds = MNISTHetero().load('mnist_guest/')
# print(len(ds))
# print(ds[0][0]) 
# print(ds.get_classes())
# print(ds.get_sample_ids()[0: 10])# test host dataset
# ds = MNISTHetero(return_label=False).load('mnist_host')
# print(len(ds))
# print(ds[0]) # no labelimport os
import torch as t
from torch import nn
from pipeline import fate_torch_hook
from pipeline.component import HeteroNN
from pipeline.component.hetero_nn import DatasetParam
from pipeline.backend.pipeline import PipeLine
from pipeline.component import Reader, Evaluation, DataTransform
from pipeline.interface import Data, Modelfate_torch_hook(t)# bind path to fate name&namespace
fate_project_path = os.path.abspath('./')
guest = 4
host = 3pipeline_img = PipeLine().set_initiator(role='guest', party_id=guest).set_roles(guest=guest, host=host)guest_data = {"name": "mnist_guest", "namespace": "experiment"}
host_data = {"name": "mnist_host", "namespace": "experiment"}guest_data_path = fate_project_path + '/mnist_guest'
host_data_path = fate_project_path + '/mnist_host'
pipeline_img.bind_table(name='mnist_guest', namespace='experiment', path=guest_data_path)
pipeline_img.bind_table(name='mnist_host', namespace='experiment', path=host_data_path)guest_data = {"name": "mnist_guest", "namespace": "experiment"}
host_data = {"name": "mnist_host", "namespace": "experiment"}
reader_0 = Reader(name="reader_0")
reader_0.get_party_instance(role='guest', party_id=guest).component_param(table=guest_data)
reader_0.get_party_instance(role='host', party_id=host).component_param(table=host_data)# 这里为什么要这样定义,可以看文档,有模型https://fate.readthedocs.io/en/latest/federatedml_component/hetero_nn/
hetero_nn_0 = HeteroNN(name="hetero_nn_0", epochs=3,interactive_layer_lr=0.01, batch_size=512, task_type='classification', seed=100)
guest_nn_0 = hetero_nn_0.get_party_instance(role='guest', party_id=guest)
host_nn_0 = hetero_nn_0.get_party_instance(role='host', party_id=host)# define model
# image features 784, guest bottom model
# our simple classification model:
guest_bottom = t.nn.Sequential(t.nn.Linear(784, 8),t.nn.ReLU()
)# image features 784, host bottom model
host_bottom = t.nn.Sequential(t.nn.Linear(784, 8),t.nn.ReLU()
)# Top Model, a classifier
guest_top = t.nn.Sequential(nn.Linear(8, 10),nn.Softmax(dim=1)
)# interactive layer define
interactive_layer = t.nn.InteractiveLayer(out_dim=8, guest_dim=8, host_dim=8)# add models, 根据文档定义,guest要add2个,host只有一个
guest_nn_0.add_top_model(guest_top)
guest_nn_0.add_bottom_model(guest_bottom)
host_nn_0.add_bottom_model(host_bottom)# opt, loss
optimizer = t.optim.Adam(lr=0.01) 
loss = t.nn.CrossEntropyLoss()# use DatasetParam to specify dataset and pass parameters
# 注意和你手动加入的文件库名字要对应
guest_nn_0.add_dataset(DatasetParam(dataset_name='mnist_hetero', return_label=True))
host_nn_0.add_dataset(DatasetParam(dataset_name='mnist_hetero', return_label=False))hetero_nn_0.set_interactive_layer(interactive_layer)
hetero_nn_0.compile(optimizer=optimizer, loss=loss)pipeline_img.add_component(reader_0)
pipeline_img.add_component(hetero_nn_0, data=Data(train_data=reader_0.output.data))
pipeline_img.add_component(Evaluation(name='eval_0', eval_type='multi'), data=Data(data=hetero_nn_0.output.data))
pipeline_img.compile()pipeline_img.fit()# 获得结果
pipeline_img.get_component('hetero_nn_0').get_output_data()  # get result

相关内容

热门资讯

《父亲》诗歌朗诵 《父亲》诗歌朗诵(通用20首)  在我们平凡的日常里,大家或多或少都接触过一些经典的诗歌吧,诗歌语言...
祭奠母亲的诗词 祭奠母亲的诗词(精选10首)  无论是身处学校还是步入社会,大家都接触过很多优秀的诗歌吧,诗歌是按照...
2021祝福中国共产党100... 党,有您的阳光陪伴,我们在快乐成长;有您的阳光普照,我们在健康成材;在这里我们要高声喊出:“中国共产...
赞美党的诗歌 赞美党的诗歌集锦  1、旗帜更鲜艳  你可记得南湖的红船,  你可记得井冈山的烽烟,  你可记得遵义...
描写雪的现代唯美诗歌模版 描写雪的现代唯美诗歌模版  在平日的学习、工作和生活里,大家最不陌生的就是诗歌了吧,诗歌以强烈的节奏...
绵雨诗歌 绵雨诗歌  (一)  绵雨 缓缓低飘过来  梦园蓝蓝 紫薇红  雨滴白 雨丝儿软  风中摇动的嫩草 ...
2021年庆祝建党100周年... 2021年是中国共产党建党100周年,我们要锚定目标、担当实干,朝着实现第二个百年奋斗目标、实现中华...
六月天的经典诗歌 关于六月天的经典诗歌(精选8首)  无论是在学校还是在社会中,大家肯定对各类诗歌都很熟悉吧,不同的诗...
歌唱建党100周年诗歌朗诵稿... 2021年七月一日,伟大的中国共产党即将迎来建党100周年大喜的日子,那你知道歌唱建党100周年的优...
小敏迦南诗歌 小敏迦南诗歌精选  在平平淡淡的学习、工作、生活中,许多人都接触过一些比较经典的诗歌吧,诗歌饱含丰富...
幼儿园国庆节抒情诗歌朗诵 幼儿园国庆节抒情诗歌朗诵  今年,是祖国成立67周年,为了纪念我们中国,我们会举办一些诗歌朗诵比赛活...
描写雨的诗歌 描写雨的诗歌(精选22首)  在日常学习、工作和生活中,大家总免不了要接触或使用诗歌吧,不同的诗歌,...
初遇不凡诗歌 初遇不凡诗歌  昨晚一事引人思迩  有感于此为诗云尔  一语一言  胜过繁星满天  一声喃喃  抵过...
想念你的歌爱情诗歌 想念你的歌爱情诗歌  想念你的歌  黄叶徐徐飘落,大雁渐渐南飞,  一阵微风吹过,脸颊顿感清凉,  ...
海子《亚洲铜》诗歌赏析 海子《亚洲铜》诗歌赏析  《亚洲铜》是当代诗人海子创作的一首现代诗歌。这是他的成名作,也是最早为他带...
水的诗歌 关于水的诗歌(通用11首)  在日常学习、工作和生活中,大家都知道一些经典的诗歌吧,诗歌具有音韵和谐...
感恩的心少儿诗歌朗诵 感恩的心少儿诗歌朗诵(通用10篇)  在现实生活或工作学习中,大家最不陌生的就是诗歌了吧,诗歌具有音...
适合元旦朗诵的诗歌   适合元旦朗诵的诗歌  1、《啊!又是一年过去了》  新的一年即将开始。时间过得很快,欢乐的时光好...
2021举国同庆建党100周... 2021年7月1日,全国人民的心紧紧连在了一起,因为这一天是中国共产党成立100周年纪念日,全国上下...
庆祝教师节的诗歌 有关庆祝教师节的诗歌  在教师节是教师的节日,那么我们应该如何庆祝教师节呢?下面是小编分享给大家的有...