replication related work on data nodes
This commit is contained in:
@@ -1,14 +1,15 @@
|
||||
1. each file can choose the replication factor
|
||||
2. replication granularity is in volume level
|
||||
3. if not enough spaces, we can automatically decrease some volume's the replication factor, especially for cold data
|
||||
4. support migrating data to cheaper storage
|
||||
5. manual volume placement, access-based volume placement, auction based volume placement
|
||||
4. plan to support migrating data to cheaper storage
|
||||
5. plan to manual volume placement, access-based volume placement, auction based volume placement
|
||||
|
||||
When a new volume server is started, it reports
|
||||
1. how many volumes it can hold
|
||||
2. current list of existing volumes
|
||||
2. current list of existing volumes and each volume's replication type
|
||||
Each volume server remembers:
|
||||
1. current volume ids, replica locations
|
||||
1. current volume ids
|
||||
2. replica locations are read from the master
|
||||
|
||||
The master assign volume ids based on
|
||||
1. replication factor
|
||||
@@ -17,12 +18,13 @@ The master assign volume ids based on
|
||||
On master, stores the replication configuration
|
||||
{
|
||||
replication:{
|
||||
{factor:1, min_volume_count:3, weight:10},
|
||||
{factor:2, min_volume_count:2, weight:20},
|
||||
{factor:3, min_volume_count:3, weight:30}
|
||||
{type:"00", min_volume_count:3, weight:10},
|
||||
{type:"01", min_volume_count:2, weight:20},
|
||||
{type:"10", min_volume_count:2, weight:20},
|
||||
{type:"11", min_volume_count:3, weight:30},
|
||||
{type:"20", min_volume_count:2, weight:20}
|
||||
},
|
||||
port:9333,
|
||||
|
||||
}
|
||||
Or manually via command line
|
||||
1. add volume with specified replication factor
|
||||
@@ -35,8 +37,6 @@ if less than the replication factor, the volume is in readonly mode
|
||||
if more than the replication factor, the volume will purge the smallest/oldest volume
|
||||
if equal, the volume will function as usual
|
||||
|
||||
maybe use gossip to send the volumeServer~volumes information
|
||||
|
||||
|
||||
Use cases:
|
||||
on volume server
|
||||
@@ -47,13 +47,33 @@ Use cases:
|
||||
|
||||
Bootstrap
|
||||
1. at the very beginning, the system has no volumes at all.
|
||||
2. if maxReplicationFactor==1, always initialize volumes right away
|
||||
3. if nServersHasFreeSpaces >= maxReplicationFactor, auto initialize
|
||||
4. if maxReplicationFactor>1
|
||||
weed shell
|
||||
> disable_auto_initialize
|
||||
> enable_auto_initialize
|
||||
> assign_free_volume vid "server1:port","server2:port","server3:port"
|
||||
> status
|
||||
5.
|
||||
|
||||
When data node starts:
|
||||
1. each data node send to master its existing volumes and max volume blocks
|
||||
2. master remembers the topology/data_center/rack/data_node/volumes
|
||||
for each replication level, stores
|
||||
volume id ~ data node
|
||||
writable volume ids
|
||||
If any "assign" request comes in
|
||||
1. find a writable volume with the right replicationLevel
|
||||
2. if not found, grow the volumes with the right replication level
|
||||
3. return a writable volume to the user
|
||||
|
||||
For the above operations, here are the todo list:
|
||||
for data node:
|
||||
1. onStartUp, and periodically, send existing volumes and maxVolumeCount store.Join(), DONE
|
||||
2. accept command to grow a volume( id + replication level) DONE
|
||||
/admin/assign_volume?volume=some_id&replicationType=01
|
||||
3. accept status for a volumeLocationList if replication > 1 DONE
|
||||
/admin/set_volume_locations?volumeLocations=[{Vid:xxx,Locations:[loc1,loc2,loc3]}]
|
||||
4. for each write, pass the write to the next location
|
||||
POST method should accept an index, like ttl, get decremented every hop
|
||||
for master:
|
||||
1. accept data node's report of existing volumes and maxVolumeCount
|
||||
2. periodically refresh for active data nodes, and adjust writable volumes
|
||||
3. send command to grow a volume(id + replication level)
|
||||
4. NOT_IMPLEMENTING: if dead/stale data nodes are found, for the affected volumes, send stale info
|
||||
to other data nodes. BECAUSE the master will stop sending writes to these data nodes
|
||||
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user