CS 351 -Data Organization and Management ( Section: 02)

LectureNotes, Week 2

Prepared by: Nurettin Mert Aydın /Ahmet Örnek /ArmağanDurusal

I/O Time in Disks

We assume that there is just a single user for the system and ignore the CPU time (it is negligible).

Disk I/O Time Components:

Seek Time (s): Time needed to move read/write head from one track to another one.

Rotational Latency Time (r):Time needed to come to the right track.It is ½ of the full revolution time.

Block Transfer Time (btt):Time needed to read block to memory (buffer)

Example:

If a harddisk has 3600 revolutions per minute, its rotational latency time (r) can be found in this way:

3600/60 = 60 rev./sec. => 1/60 sec./rev. (It takes 1/60 seconds to complete 1 revolution)

1/60 sec/rev = 1000/60 msec/rev = 16.67msec/rev

r = f (revolution time) / 2 => r = 16.67/2 ~ 8.3 msec/rev

Units and Specifications of IBM 3380

Size = 2500 MB

B (block size)=2400 bytes

t (data transfer speed / data rate)=3000 bytes/msec

t' (formatted speed) = 2857 bytes/msec

btt (block transfer time) = B / t = 0.8 msec

c (blocks per cylinder) = 600

ebt (effective block transfer time in seq. reading) = 0.84 msec

m (minimum seek time) = 3 msec

N (number of cylinders) = 885

r (rotational latency time) = 8.3 msec

s (average seek time) = 16 msec

Random vs. Sequential Reading

With the information of IBM 3380,sequential and random readings can be calculated as in the following table:

Read Type

10 Blocks to Read

100 Blocks to Read

Sequential

s + r + 10 * ebt

= 16 + 8.3 + 10 * 0.84

= 32.7 msec

s + r + 100 * ebt

= 16 + 8.3 + 100 * 0.84

= 108.3 msec

Random

10 * (s + r + btt)

= 10 * (16 + 8.3 + 0.80)

= 251 msec

100 * (s + r + btt)

= 2510 msec

Let"b" denote the number of blocks to be read

(b* 25.1) / (24.3 + b * 0.84)= 25.1/ ((24.3 / b) + 0.84) when b goes to infinity (asymptotic case) random read is 30 times SLOWER than sequential read!

Random Read : b * ( s + r + btt)

Sequential Read (for large values of b) : b * ebt

Note: In our calculations we assume that blocks are even multiple of record size

Bfr (blocking factor)=B (Block size) / R (Record size)mod (B,R) = 0

b (number of blocks) = n (number ofrecords) / Bfr

Example(COBOL):

In data division of COBOL programs:

RECORD CONTAINS80 CHARS

BLOCKCONTAINS10 RECORDS

Here, BLOCK indicates a BUCKET

We cannot control the block size but we can control the bucket size.

Bucket size = Buffer Size

Random Read Time of the Buckets in a File

bk: numberof buckets

r: rotational latency time

s: seek time

dtt: data transfer time

time = bk * (s + r + dtt)

= bk * (s + r) + bk * dtt

Let "bk * (s + r)" be the first component. We are calculating the total of seek time (s) and rotational latency time (r) for each bucket (by multiplying this total time by the number of buckets). This component depends on the bucket size.

Let "bk * dtt" be the second component. This component is independent from the bucket size. Whatever the bucket size is, some amount of data will always be read.

When we increase the total size, random read time decreases.

When we decrease the total size, random read time increases.

Random Read Time of the Records in a File

This approach assumes that we have (in memory) a list of records in the file as well as their corresponding addresses.

n : numberof records

time = n * (r + s + dtt)

When we increase the bucket size, this I/O time increases.

When we decrease the bucket size, this I/O time decreases.

Example:

n = 30.000

record size (bytes/record) = 100 bytes

block size= 2400 bytes

btt (blocktransfer time):

a) for random = 2400/3000 = 0.8 ms

b) for sequential = 2400/2857 = 0.84 ms.

btt for sequential read is slowerbecause some additional time is spent on skipping the gaps.

Bucket Size	bk	Sequential Reading (of all buckets)	Random Reading (by record)	Random Reading (by bucket)
2400	(30000 * 100)/2400 = 1250	1250 * 0,84 = ~ 1 sec (r, s ignored)	30000 * (16 + 8,3 +0,8) = 753 sec where 16 = r, 8,3 = s	1250 * (16 + 8,3 + 0,8) = 31 sec
4800	(30000 * 100)/4800 = 625	625 * (0,84 + 0,84) = ~ 1 sec	30000 * (16 + 8,3 + 1,64) = 778 sec where 1,64 = 0,8 + 0,84	625 * (16 + 8,3 + 1,64) = 16 sec
14400	(30000 * 100)/14400 = 208.33	208,33 * (0,84 * 6) = ~ 1 sec	30.000 * (16 + 8,3 + 0,8 + 5*0,84) = 879 sec	208,33 * (16 + 8,3 + 0,8 + 5*0,84) = 6 sec
28800	(30000 * 100)/28800 = 104.16	104,6 * (0,84 * 12) = ~ 1 sec	30.000 * (16 + 8,3 + 0,8 + 11*0,84) =1030 sec	104,16 * (16 + 8,3 + 0,8 + 11*0,84) = 3,6 sec
...	...	...	...	...
as the size increases	decreases	no effect	little effect	*makes a significant difference*

[Above in the table, in random reading, we used 1,64 and this is the total of a single random reading time (0,8) followed by a seq. reading time (0,84).

Similarly for the greater bucket sizes, ie for 14400, we used "0,8 + 5*0,84" and this means that a random reading is followed by 5 seq. reading times.]

Optimum Bucket Size for a Mix of Operations

For the file described in the example above,

Think of a 1000 random reads + reading of whole file by bucket

What is the optimum bucket size (x)?

1000 * (s + r + dtt) + bk * (s + r + dtt)

where "1000 * (s + r + dtt)" is "1000 random reads"

and "bk * (s + r + dtt)" is "all of the buckets read"

=(1000 + bk) * (s + r + dtt)

dtt= bucket size / t' where t' = 2857

bk= [(number of records)(record size)] / x = [(30.000)(100)] / x

f(x) =[1000 + (3.000.000/x)] [16 + 8,3 + (x/2857)]

Take the derivative with respect to x and set it equal to zero to find x which yields the minimum cost

f(x) =constant + [(24,3)(3.000.000)]/x + [(1000) x]/2857

d/dx(1/x) = d/dx (x^-1) = -x^-2

(24,3)(3.000.000)= x^-2(1000/2857) =>x = 14,431

SEQUENTIAL FILES

Pile: unordered sequential fıle

Assumptions: Large file, single user, fixed size records

File Operation Times

T_F= time to fetch (fınd and read) one record

T_N= time to fetch next record

T_I= time to insert a new record

T_U= time to update a record

T_D= time to delete a record

T_X= time for eXhaustive reading of a file

T_Y= time for reorganization of a file

OS(operating system) has a file header (file control block) for each file.The headers include information about the fıle and records and blocks.

file control block

Extents are large areas of blocks.A file will be stored in terms of extents.

T_F:Fetching One Record

In the pile file we search sequentially through the whole file until we find our target record.

T_F= Single record fetch

12b = no.of blocks

"blocks", each one has the same chance (1/b) to be selected

average no. of blocks to be read = 1 * 1/b+2 * 1/b+3 * 1/b+..................+b * 1/b=

=(1 + 2 + 3 +..................+ b) *1/b =

=( b*(b+1)/ 2 )*1 / b= ( b+1) / 2» b / 2

T_F=( b / 2) * ebt(sequential read)

Example (Hospital File):

The file has patient records of a hospital. It has 100 000 records of 400 bytes each. How much will it take to look up the record without using an index.

n = 100 000 ( no. of records)

R = 400 bytes (record size)

B = 2400 bytes (block size)

(no. of blocks) = b = 10⁵* 4 * 10²/2400=1.6667 *10⁴

b / 2 = 8333,ebt= 2400 / 2857 = 0.84ms

T_F= 8333 * 0.84=7000 ms = 7 sec.(for 1 record)

For 10 000 records it will take7 * 10⁴records~19 hours!

T_N:Fetching the Next Record

T_N =Time needed to fetch the next (logical) record

It will take the same time as random record fetching because the records in a file are not stored in order.

T_N=T_F

T_I:Inserting a New Record

T_I=Time to insert a new record

To insert a new record we place it at the end of the file.

We assume that the address of the last block is kept in memory.Also it can be kept in the file header and replaced at the end of the session.

File header

New corner

Last Block

For insertion the last block is read and the new record is put at the end.

T_I=s + r + btt +2r

"s + r +btt" is for the reading of the last block

"2r"is for the waiting time for 1 full revolution to write the block back to its place

e.g. For IBM 3380T_I = 42 ms.

T_D:Deleting a Record

T_D =time to delete a record

To delete a record , we fetch the record to be deleted in T_F,we modify it and mark the record as deleted (tombstone).

During marking the disk revolves back to the original position and after that modified record is written (tombstone approach).

The time to wait for the disk until it rotates back to the original position is 2r-btt and the time to transfer the new record is btt.

T_D = T_F + 2r (= T_F + 2r - btt + btt)

T_U:Modifying a Record

T_U =time to update a record

It is the same as T_D because we find a record,mark it,wait for the revolving disk and place the modified record.

T_U =T_F+2r

T_X:ExhaustiveReading of a File

T_X= Exhaustive reading of a file

The whole file is read without worrying about the order of records

T_X = b * ebt

For hospital file in the example

T_X=16 667 * 0.84 = 14 000 ms = 14 sec

Disk I/O Time Components:

TD:Deleting a Record

TU:Modifying a Record

TX:ExhaustiveReading of a File

T_D:Deleting a Record

T_U:Modifying a Record

T_X:ExhaustiveReading of a File