Inverted File Structures:

 

Emp-no          Emp-name        Dept       No.of dependents    Salary

01                       Jane                Gun                    1

02                      George            Shoe                   2

04                       Dave              Shoe                   3

05                       Mary               Gun                   2

06                       Mike               Shoe                   1

 

  Inverted indexes for secondary attribute

 

 

 

 

 


Time needed to access posting lists is a function of their length and their allocation.

 

n: number of blocks

85% of are sequentially allocated

 

file needed to access posting list= n*0.15*(s+r+btt) +(s+r) +n*0.85*ebt

 

Display Emp-name where Dept= “Shoe” and No.of dependents=2

 

How many employees in the result set?

It depends on the selectivity of the attributes.

 

R(no. of records)=10000

 

Condition                                                  No. of hits

Dept=Gun                                                 0.10*10000=1000

Dept=Shoe & no. of dependents=2           0.30*0.40*10000=1200

List Structured Files

 

Consider a student file (SSN, Name, Major, GPA,…)

Display name where major= “BUS”

 

List Structured File Organization According to attribute major:

 

SSN           Name          Major           Year           GPA         Link

10               Dave             BUS               3                 4.0            30

20               Jack               CS                 1                 3.5            40

30               Jim                BUS               2                 2.7            50

40               Tom              Music             3                 2.2             0

50               Don               BUS              1                  3.2            60

60               Sam               BUS              2                  2.0            20

 

 

Multi List Structured File Organization:

 

 

SSN         Name       Major   …        Link

40             Tom          Music                0 

10             Dave          BUS                 3

60              Sam           BUS                5

20              Jack            CS                  0

30              Jim            BUS                6

50              Don           BUS                0                                                                   

               

Bus List Head     :2

CS List Head      :4

Music List Head :1

 

Assume that list size =k (e.g. k=4 for the BUS major)

How many blocks to retrieve(access)?

Consider Yao’s Formula to find the expected no. of blocks.

 

Record placement problem: Assume that we know all the queries and the records that will satisfy each query.

Problem: Distribute records among blocks such that the cost processing queries in terms of block accesses will be minimized.

One modification: Assume we also know frequency of execution for each query.

(ACM Trans. On Database Systems, December 1990)

 

Secondary key based retrieval using Range Question

Display  student name where  GPA>3.00 & age<20

Grid Files:

 

Example:

                     Name        Height(inch)             Weight(lb)

                     Sleepy            36                               48

                     Happy            34                               52

                     Doc                38                               51

                     Dopey            37                               54

                     Grumpy         32                                55

                     Sneezy           35                                46

                     Bashful          33                                50

                     Ms. White       62                                98

 

Data bucket size=2

 

Insert Sleepy and Happy.

 

 

Insert Doc. 

 

 Insert Dopey.

 

Insert Grumpy ,Sneezy  and Bashful. 

 

Insert Ms.White

 

Query: Display name where h=50 and w=50

              Access the sleepy bucket and check for a person with h=50 and w=50

 

             False drop elimination

2 disc accesses

      1. access grid area

      2. access data bucket

Always 2 disc accesses (for successful and unsuccessful search)

 

           Characteristics of Grid Files:

1. A grid file is symmetric and adaptable symmetric, means that, each key is treated as a  

    primary key. Adaptable implies that it changes itself dynamically.

2. Data storage utilization ?70%

3. Concurrent application are easier than a B+ tree.

     Two record additions may try to change the root node in B+ tree.

4. Involves elimination of false drops: after accessing a data bucket make sure that

    records stored in the bucket matches the query condition.

 

 

K-d Trees(K dimensional trees):  

A K-d tree is a K-dimensional tree index for range queries based on secondary key.

 

         Name        Height(inch)             Weight(lb)

                     Sleepy            36                               48

                     Happy            34                               52

                     Doc                38                               51

                     Dopey            37                               54

                     Grumpy         32                                55

                     Sneezy           35                                46

                     Bashful          33                                50

                     Ms. White       62                                98

 

During the construction, alternate between the attributes of height and weight when deciding how to branch on successive levels of the tree.

 

1.Sleepy           Sleepy(36,48)

2.Happy           Happy(34,52)

                           

                                

3.Doc                 Doc(38,51)

                

                           

4.Dopey                Dopey(37,54)

 

                                  

5.Grumpy                 Grumpy(32,55)

 

                                           

6.Sneezy                      Sneezy(35,46)

 

                                      

7.Bashful                               Bashful(33,50)

 

                                              

8.Ms.White                                Ms. White(62,98)

 

 

 

 

n nodes (n+1) null pointers

 

 

                    

Person                               Region

Sleepy                                      3

Happy                                      2

Doc                                          6 

Dopey                                      7

Grumpy                                    4

Sneezy                                     1

Bashful                                     1

Ms. White                                8

                               

 

Example: K- tree structure where K=3.

 

         

         People      Tolerant(%)           Arrogant(%)         Stubborn(%)        Tender(%)        Happiness       

 

P1                80                             10                           3                            7                      80

P2                20                             60                           15                          5                      40

P3                60                             5                             25                         10                     65

P4                60                             20                           5                           15                     60

P5                10                             20                           10                         60                     55

P6                45                             5                             5                           45                     95

P7                60                             25                           5                           10                     50             

P8                10                             80                           5                           5                       25

P9                25                             5                             5                           65                    50

P10              10                             60                           25                          5                      35

P11              70                             10                           15                          5                      75

P12              55                             5                             30                         10                     60

 

 

        Key Precedence:  1. Tolerant  2. Tender  3. Stubborn then cycle. Insert in an order of happy to unhappy.

 

 

Our tree has 12 nodes so it has 12+1 = 13 null pointers shown with dash lines above.