Secondly the information has to be persistent, that means even after the application is closed the information should be persisted.
Finally it should provide an independent way of accessing data and should not be dependent on the application to access the information.
Main difference between a simple file and database that database has independent way (SQL) of accessing information while simple files do not File meets the storing, managing and retrieving part of a database but not the independent way of accessing data. Many experienced programmers think that the main difference is that file can not provide multi-user capabilities which a DBMS provides. But if we look at some old COBOL and C programs where file where the only means of storing data, we can see functionalities like locking, multi-user etc provided very efficiently. So it’s a matter of debate if some interviewers think this as a main difference between files and database accept it… going in to debate is probably loosing a job.
RDBMS = DBMS + REFERENTIAL INTEGRITY
These relations are defined by using “Foreign Keys” in any RDBMS.Many DBMS companies claimed there DBMS product was a RDBMS compliant, but according to industry rules and regulations if the DBMS fulfills the twelve CODD rules it’s truly a RDBMS. Almost all DBMS (SQL SERVER, ORACLE etc) fulfills all the twelve CODD rules and are considered as truly RDBMS.
Rule 1: Information Rule.
"All information in a relational data base is represented explicitly at the logical level and in exactly one way - by values in tables."
Rule 2: Guaranteed access Rule.
"Each and every datum (atomic value) in a relational data base is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name."
In flat files we have to parse and know exact location of field values. But if a DBMS is truly RDBMS you can access the value by specifying the table name, field name, for instance Customers.Fields [‘Customer Name’].
Rule 3: Systematic treatment of null values.
"Null values (distinct from the empty character string or a string of blank characters and distinct from zero or any other number) are supported in fully relational DBMS for representing missing information and inapplicable information in a systematic way, independent of data type.".
Rule 4: Dynamic on-line catalog based on the relational model.
"The data base description is represented at the logical level in the same way as ordinary data, so that authorized users can apply the same relational language to its interrogation as they apply to the regular data. "The Data Dictionary is held within the RDBMS, thus there is no-need for off-line volumes to tell you the structure of the database.
Rule 5: Comprehensive data sub-language Rule.
"A relational system may support several languages and various modes of terminal use (for example, the fill-in-the-blanks mode). However, there must be at least one language whose statements are expressible, per some well-defined syntax, as character strings and that is comprehensive in supporting all the following items
Data Manipulation (Interactive and by program).
Transaction boundaries (Begin, commit and rollback)
Rule 6: .View updating Rule
"All views that are theoretically updatable are also updatable by the system."
Rule 7: High-level insert, update and delete.
"The capability of handling a base relation or a derived relation as a single operand applies not only to the retrieval of data but also to the insertion, update and deletion of data."
Rule 8: Physical data independence.
"Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representations or access methods."
Rule 9: Logical data independence.
"Application programs and terminal activities remain logically unimpaired when information-preserving changes of any kind that theoretically permit un-impairment are made to the base tables."
Rule 10: Integrity independence.
"Integrity constraints specific to a particular relational data base must be definable in the relational data sub-language and storable in the catalog, not in the application programs." Rule 11: Distribution independence.
"A relational DBMS has distribution independence."
Rule 12: Non-subversion Rule.
"If a relational system has a low-level (single-record-at-a-time) language, that low level cannot be used to subvert or bypass the integrity Rules and constraints expressed in the higher level relational language (multiple-records-at-a-time)."
Benefits of Normalizing your database include:
=>Avoiding repetitive entries
=>Reducing required storage space
=>Preventing the need to restructure existing tables to accommodate new data.
=>Increased speed and flexibility of queries, sorts, and summaries.
Following are the three normal forms :-
First Normal Form
For a table to be in first normal form, data must be broken up into the smallest un possible.In addition to breaking data up into the smallest meaningful values, tables first normal form should not contain repetitions groups of fields.
Second Normal form
The second normal form states that each field in a multiple field primary key table must be directly related to the entire primary key. Or in other words, each non-key field should be a fact about all the fields in the primary key.
Third normal form
A non-key field should not depend on other Non-key field.
Fifth normal form deals with reconstructing information from smaller pieces of information. These smaller pieces of information can be maintained with less redundancy.
=>SELECT - gets data from a database table
=> UPDATE - updates data in a table
=> DELETE - deletes data from a database table
=> INSERT INTO - inserts new data into a database table
DDL stands for Data definition Language. They change structure of the database objects like table, index etc. Most important DDL statements are as shown below:-
=>CREATE TABLE - creates a new table in the database.
=>ALTER TABLE – changes table structure in database.
=>DROP TABLE - deletes a table from database
=> CREATE INDEX - creates an index
=> DROP INDEX - deletes an index
SELECT DISTINCT age FROM pcdsEmp
Below SQL statement will return all words with letter "S"
SELECT * FROM pcdsEmployee WHERE EmpName LIKE 'S%'
Below SQL statement will return all words which end with letter "S"
SELECT * FROM pcdsEmployee WHERE EmpName LIKE '%S'
Below SQL statement will return all words having letter "S" in between
SELECT * FROM pcdsEmployee WHERE EmpName LIKE '%S%'
"_" operator (we can read as “Underscore Operator”). “_” operator is the character defined at that point. In the below sample fired a query Select name from pcdsEmployee where name like '_s%' So all name where second letter is “s” is returned.
UPDATE pcdsEmployee SET age='25' where name='rohit';
DELETE FROM pcdsEmployee WHERE name = 'sonia';
Ascending order sort query
SELECT name,age FROM pcdsEmployee ORDER BY age ASC
Descending order sort query
SELECT name FROM pcdsEmployee ORDER BY age DESC
SELECT * FROM pcdsEmployeeSalary a WHERE (2=(SELECT COUNT(DISTINCT(b.salary)) FROM pcdsEmployeeSalary b WHERE b.salary>=a.salary))
Inner join shows matches only when they exist in both tables. Example in the below SQL there are two tables Customers and Orders and the inner join in made on Customers.Customerid and Orders.Customerid. So this SQL will only give you result with customers who have orders. If the customer does not have order it will not display that record.
SELECT Customers.*, Orders.* FROM Customers INNER JOIN Orders ON Customers.CustomerID =Orders.CustomerID
LEFT OUTER JOIN
Left join will display all records in left table of the SQL statement. In SQL below customers with or without orders will be displayed. Order data for customers without orders appears as NULL values. For example, you want to determine the amount ordered by each customer and you need to see who has not ordered anything as well. You can also see the LEFT OUTER JOIN as a mirror image of the RIGHT OUTER JOIN (Is covered in the next section) if you switch the side of each table.
SELECT Customers.*, Orders.* FROM Customers LEFT OUTER JOIN Orders ON Customers.CustomerID =Orders.CustomerID
RIGHT OUTER JOIN
Right join will display all records in right table of the SQL statement. In SQL below all orders with or without matching customer records will be displayed. Customer data for orders without customers appears as NULL values. For example, you want to determine if there are any orders in the data with undefined CustomerID values (say, after a conversion or something like it). You can also see the RIGHT OUTER JOIN as a mirror image of the LEFT OUTER JOIN if you switch the side of each table.
SELECT Customers.*, Orders.* FROM Customers RIGHT OUTER JOIN Orders ON Customers.CustomerID =Orders.CustomerID
=>>DELETE TABLE syntax logs the deletes thus making the delete operations low. TRUNCATE table does not log any information but it logs information about deallocation of data page of the table. So TRUNCATE table is faster as compared to delete table.
=>>DELETE table can have criteria while TRUNCATE can not.
=>> TRUNCATE table can not have triggers.
For example, to retrieve all EmployeeID and CustomerID records from the ORDERS table that have the EmployeeID greater than the average of the EmployeeID field, you can create a nested query, as shown:
SELECT DISTINCT EmployeeID, CustomerID FROM ORDERS WHERE EmployeeID > (SELECT AVG(EmployeeID) FROM ORDERS)
Aggregate functions operate against a group of values but returns only one value.
AVG(column) :- Returns the average value of a column
COUNT(column) :- Returns the number of rows (without a NULL value) of a column
COUNT(*) :- Returns the number of selected rows
MAX(column) :- Returns the highest value of a column
MIN(column) :- Returns the lowest value of a column
Scalar functions operate against a single value and return value on basis of the single value.
UCASE(c) :- Converts a field to upper case
LCASE(c) :- Converts a field to lower case
MID(c,start[,end]) :- Extract characters from a text field
LEN(c) :- Returns the length of a text
CREATE VIEW [MyView] AS SELECT * from pcdsEmployee where LastName = 'singh'
In order to query the view
SELECT * FROM [MyView]
SQL injection attacks typically are easy to avoid by ensuring that a system has strong input validation.
As name suggest we inject SQL which can be relatively dangerous for the database. Example this is a simple SQL
SELECT email, passwd, login_id, full_name
FROM members WHERE email = 'x'
Now somebody does not put “x” as the input but puts “x ; DROP TABLE members;”.
So the actual SQL which will execute is :-
SELECT email, passwd, login_id, full_name FROM members WHERE email = 'x' ; DROP TABLE members;
Think what will happen to your database.
Data warehouse is a collection of integrated, subject-oriented databases designed to support the decision-support functions (DSF), where each unit of data is relevant to some moment in time.
There are two basic tables in dimensional modeling:-
Fact tables are central tables in data warehousing. Fact tables have the actual aggregate values which will be needed in a business process. While dimension tables revolve around fact tables. They describe the attributes of the fact tables.
In this process we extract data from the source. In actual scenarios data source can be in many forms EXCEL, ACCESS, Delimited text, CSV (Comma Separated Files) etc. So extraction process handle’s the complexity of understanding the data source and loading it in a structure of data warehouse.
This process can also be called as cleaning up process. It’s not necessary that after the extraction process data is clean and valid. For instance all the financial figures have NULL values but you want it to be ZERO for better analysis. So you can have some kind of stored procedure which runs through all extracted records and sets the value to zero.
After transformation you are ready to load the information in to your final data warehouse database.
There are two basic aims of Data mining:-
From the given data we can focus on how the customer or market will perform. For instance we are having a sale of 40000 $ per month in India, if the same product is to be sold with a discount how much sales can the company expect.
To derive important information to analyze the current business scenario. For example a weekly sales report will give a picture to the top management how we are performing on a weekly basis?
Insert’s are slower on tables which have indexes, justify it?or Why do page splitting happen?
Creates a new page to balance the tree.
Shuffle and move the data to pages.
So if your table is having heavy inserts that means it’s transactional, then you can visualize the amount of splits it will be doing. This will not only increase insert time but will also upset the end-user who is sitting on the screen. So when you forecast that a table has lot of inserts it’s not a good idea to create indexes.
In clustered index the non-leaf level actually points to the actual data.In Non-Clustered index the leaf nodes point to pointers (they are rowid’s) which then point to actual data.