sql - How to eliminate duplicate rows joining on tables across databases? -


i have been working on script while , have reached dead end. script works unfortunately produces duplicates. script joins 2 different tables across databases on state_issue_teacher_id key , produces output. have checked both tables , row counts same , join should match records evidently there problem key or way i'm joining table , output coming partially incorrect. i've tried concatenating attributes make unique key , join tables still producing incorrect results.

here script:

select             ltrim(rtrim(rt.year_time)) 'year_time' ,        ltrim(rtrim(rt.state_issue_teacher_id)) state_issue_teacher_id ,        ltrim(rtrim(rt.district_code)) district_code ,        rt.district_name ,        rt.school_name ,        ltrim(rtrim(rt.assignment_code)) assignment_code ,        rt.assignment_desc ,        ltrim(rtrim(rt.position_code)) position_code ,        rt.position_desc ,        ltrim(rtrim(rt.last_name)) last_name ,        ltrim(rtrim(rt.first_name)) first_name ,        ltrim(rtrim(rt.total_salary)) total_salary ,        rt.assign_fte ,        ltrim(rtrim(rt.school_code)) school_code ,        rt.fte         staging.dbo.rt rt      left join ( select   ltrim(rtrim(dti.year)) year ,                     ltrim(rtrim(dt.teacher_id)) teacher_id ,                     ltrim(rtrim(db.district_code)) district_code ,                     db.district_name ,                     ltrim(rtrim(dt.last_name)) last_name ,                     ltrim(rtrim(dt.first_name)) first_name ,                     ltrim(rtrim(da.assignment_code)) assignment_code ,                     ltrim(rtrim(dp.position_code)) position_code ,                     dre.race_ethnicity_code ,                     ltrim(rtrim(substring(db.school_code,10,4))) school_code ,                     da.assignment_desc ,                     dp.position_desc ,                     fs.total_fte                 mart.dbo.fact_s fs                     left outer join mart.dbo.fact_s.dbo.dim_building                     db on fs.building_key = db.building_key                     left outer join mart.dbo.fact_s.dbo.dim_teacher                     dt on fs.teacher_key = dt.teacher_key                     left outer join mart.dbo.fact_s.dbo.dim_assignment                     da on fs.assignment_key = da.assignment_key                     left outer join mart.dbo.fact_s.dbo.dim_race_ethnicity                     dre on dt.race_ethnicity_key = dre.race_ethnicity_key                     left outer join mart.dbo.fact_s.dbo.dim_gender                     dg on dt.gender_key = dg.gender_key                     left outer join mart.dbo.fact_s.dbo.dim_time                     dti on fs.time_key = dti.time_key                     left outer join mart.dbo.fact_s.dbo.dim_position                     dp on fs.position_key = dp.position_key               dti.year = '2012'             ) raw on    rt.state_issue_teacher_id = raw.teacher_id                                          , rt.year_time = raw.year                          , rt.last_name = raw.last_name                          , rt.first_name = raw.first_name                          , rt.district_code = raw.district_code                         , rt.position_code = raw.position_code                         , rt.school_code = raw.school_code                         , rt.assignment_code = raw.assignment_code          rt.year_time = '2012'        order rt.last_name, rt.first_name 

the output i'm getting is: enter image description here

the fte combined teachers assignments should add 1. teachers have same assignment_code/desc multiple partial assignments producing duplicates. example: jane doe appears 4 times total fte of 2.0 instead of 2 times correct total of 1.0. output should read follows. enter image description here

you appear getting duplicates part-time teachers have multiple assignments, descriptions of assignment being same. quite clear first 4 rows of actual output versus first 2 of desired output.

i wonder why have duplicates begin with. however, in fact table there must important (i suppose 2 part-time guidance counselors funded rather 1 full-time one). fact table have exact duplicate records in case? if not, fields not duplicated may suggest additional join key fix problem.

you need rid of cartesian product produced join condition: rt.assignment_code = raw.assignment_code.

apart finding better join key, can think of 2 ways fix this. first create unique id positions. perhaps in data structure know of one. or, use row_number() add sequence number people have multiple positions.

the other way eliminate duplicates on 1 side or other. instance, might aggregate rt eliminate such duplicates.


Comments

Popular posts from this blog

java - Jmockit String final length method mocking Issue -

What is the difference between data design and data model(ERD) -

ios - Can NSManagedObject conform to NSCoding -