decor
 

planetDB2 logo

Planet DB2 is an aggregator of blogs about the IBM DB2 database server. We combine and republish posts by bloggers around the world. Email us to have your blog included.

 

April 27, 2017

Jack Vamvas

How to use db2diag command line option for time

Question: A vendor has requested a snippet from db2diag log. The file needs to include all the db2diag log entries between 00:00:00 and 01:10  from 1 week ago.   I checked the file and the timeframe is available – but there are thousands of entries around that time , and so it is time consuming to extract the details.

Is there a way with db2diag command line option to extract between a start and finis time?

Answer: db2diag command line options offers a wider range of methods to interrogate and extract details. 

There is a command line option –time , which will satisfy your requirements. Let’s check an example.

db2diag -time 2016-04-16-00.00:2016-04-16-01.10

This example will extract all entries between midnight and 01:10 for the date 2016-04-16.

The IBM DB2 documentation has a full comprehensive details for  db2diag - db2diag logs analysis tool command

 

 

April 26, 2017


Adam Gartenberg

Now Available - IBM and Cisco Integration

Yesterday marked the general availability for the integrated solutions combining IBM's collaboration offerings with Cisco's communication capabilities. These solutions, which were first announced...

(Read more)
 

April 25, 2017

Big Data University

This Week in Data Science (April 25, 2017)

Here’s this week’s news in Data Science and Big Data. analytic-maturity

Don’t forget to subscribe if you find this useful!

Interesting Data Science Articles and News

 

Featured Courses From BDU

 

Upcoming Data Science Events

 

Cool Data Science Videos

The post This Week in Data Science (April 25, 2017) appeared first on BDU.


DB2utor

DB2 12 Catalog Availability Improvements

During the early releases of DB2, enterprises (specifically DBAs) would restrict dynamic SQL from running in their production OLTP environments. This choice stemmed primarily from the likelihood of ad hoc OLAP queries adversely impacting the performance of OLTP applications. Of course, over the next 20-plus years, this issue has been largely mitigated by the widespread adoption of dynamic SQL. Dynamic SQL continues to be a focus of DB2 development, and DB2 12 features an important enhancement in the area of catalog availability.
 

April 24, 2017


ChannelDB2 Videos

DB2 Tips n Tricks Part 113 - Roll Forward Phases - Is BACKWARD Phase aware of FORWARD?


Thumbnail

Demystify Roll Forward Phases - Is BACKWARD Phase aware of FORWARD Phase? Happy Learning & Sharing http://db2luwacademy.blogspot.in
 

April 22, 2017


DB2Night Replays

The DB2Night Show #Z76: DB2 SQL Tuning Tips

Presented by: Tony Andrews Instructor and Consultant, Themis Inc. "The DB2Night Show Z76: DB2 SQL Tuning Tips" Replays available in WMV and M4V formats! 97% of our studio audience learned something!Tony described many SQL coding issues that cause bad performance and then discussed the solutions. Watch the replay...

(Read more)
 

April 21, 2017


Craig Mullins

Hope to See You in Anaheim for the IDUG DB2 Tech Conference!

This year the IDUG North American DB2 Tech Conference is being held in Anaheim, CA from April 30th through May 4th. That means we'll be near Disneyland, so you can plan for a great week of DB2 and analytics education... and then follow it up with a visit to Mickey.  Whether or not you care about Disneyland is irrelevant, though, because a week at an IDUG event is always an educational...

(Read more)
 

April 20, 2017


ChannelDB2 Videos

DB2 Tips n Tricks Part 112 - Demystify Rollforward Phases - FORWARD BACKWARD Phases


Thumbnail

Demystify Roll forward Phases - FORWARD BACKWARD Phases REDO and UNDO CLOSED and OPEN Transactions Happy Learning & Sharing http://db2luwacademy.blogspot.in
 

April 19, 2017


Data and Technology

A Look at Data Professional’s Salaries

The annual ComputerWorld IT Salary Survey for 2017 was recently published and it contains a great wealth of interesting data. So, as I’ve done in the past, this post will summarize its findings...

(Read more)

Vincent McBurney

Test driving the IBM InfoSphere Information Governance Catalog user interface changes in 11.5

A new look InfoSphere Information Governance Catalog includes a better looking homepage and some easier data entry, search and navigation.
 

April 18, 2017


Kim May

Calling All IDAA Users!

The next IBM-sponsored DB2 Analytics Accelerator Users’ Group meeting was announced this week.  The Fillmore Group’s Frank Fillmore will be presenting one of the sessions, along with BNY/Mellon...

(Read more)

Triton Consulting

IDUG EMEA 2017 – Call for presentations deadline approaching

Are you attending this year’s IDUG EMEA Technical Conference?  There’s still time to submit a presentation abstract. The deadline date for abstract entries is  24th April 2017, so just under a week away. Share your DB2 experiences with fellow members … Continue reading →

(Read more)
Big Data University

This Week in Data Science (April 18, 2017)

Here’s this week’s news in Data Science and Big Data. ibmwatsonresearcher

Don’t forget to subscribe if you find this useful!

Interesting Data Science Articles and News

 

Featured Courses From BDU

 

Upcoming Data Science Events

 

Cool Data Science Videos

The post This Week in Data Science (April 18, 2017) appeared first on BDU.


ChannelDB2 Videos

DB2 HADR - 1 - How Standby requires Primary in spite of locally available Archive Log files


Thumbnail

How Standby requires Primary in spite of locally available Archive Log files Change in behavior from 9.7 to 10.1 Happy Learning & Sharing http://db2luwacadem...

DB2utor

Improved EDM Pool Management in DB2 12

Environmental descriptor manager (EDM) pools  contain skeleton application plans and packages, database descriptors, and cached dynamic SQL statements.

DB2 Guys

IBM and HCL Strategic Partnership to jointly develop and market IBM Informix

by Dan Hernandez, VP Offering Management Analytics Friends, I have some very exciting news to share with you today. I am very pleased to announce that IBM and HCL are entering a strategic partnership to jointly develop and market the IBM Informix family of products. We are building on a history of collaboration between the […]
 

April 17, 2017


Craig Mullins

The DB2 12 for z/OS Blog Series - Part 9: Piece-wise Deletion

Adding the FETCH FIRST clause to the DELETE statement at first appears to be a very simple enhancement, but upon closer examination it is really quite powerful. Prior to DB2 12 for z/OS, the FETCH FIRST n ROWS ONLY clause could be specified on a SELECT statement. The clause had two impacts:the number specified for n is used by the optimizer to formulate an access paththe result set of...

(Read more)
 

April 16, 2017


Matthias Nicola

XML… in the Cloud !

Over the past couple of years I have focused a lot on Cloud Data Services and less on XML as a specialty topic.  But sure enough, all of the great XML processing capabilities that you know from DB2 pureXML are also available in cloud database services such as dashDB and DB2 on Cloud. DB2 on […]

(Read more)
 

April 14, 2017


Data and Technology

Why isn’t e-mail more easily queryable?

Today’s blog post is just a short rumination on finding stuff in my vast archive of email… Do you ever wonder why e-mail systems don’t use database management technology?  Don’t you store...

(Read more)
 

April 12, 2017


Henrik Loeser

DB2 Easter Egg Quiz - Kind of

Are you ready for a new DB2 quiz? Here is a tricky question for you. It is about DB2 security, it is (kind of) hardware-related and deals with a relatively new DB2 feature. Curious...? The following...

(Read more)
 

April 11, 2017


ChannelDB2 Videos

DB2 Tips n Tricks Part 111 - How Quiesce Table spaces for Table Intent To Update is not Working


Thumbnail

How Quiesce Table spaces for Table Intent To Update is not Working Welcome Inputs/Suggestions. Happy Learning & Sharing http://db2luwacademy.blogspot.in
Big Data University

This Week in Data Science (April 11, 2017)

Here’s this week’s news in Data Science and Big Data. ibm

Don’t forget to subscribe if you find this useful!

Interesting Data Science Articles and News

Featured Courses From BDU

Cool Data Science Videos

The post This Week in Data Science (April 11, 2017) appeared first on BDU.


DB2utor

Machine Learning Applications

The recent announcement of IBM Machine Learning on z/OS is significant for a couple of reasons. First, it's another example of IBM's ongoing investment to the mainframe. Beyond that, it's important for anyone who supports software installation, database administration and analytics on z/OS.
 

April 07, 2017


DB2Night Replays

The DB2Night Show #192: ALL ABOUT IDUG Anaheim 2017 plus Badge Winner!

Follow @IDUGDB2 Special Guests: Terry Johnson, IDUG Conference Chair, and Paul Turpin, President IDUG Board ALL ABOUT IDUG Anaheim 100% of our audience learned something! Terry told us all about the upcoming IDUG Conference in Anaheim California, April 30-May 4. This year the conference format is "new and improved", so be sure to listen to the changes to make the most of your IDUG DB2 Tech Conference experience! Paul Turpin also shared...

(Read more)
 

April 06, 2017


ChannelDB2 Videos

Oracle Renewal Looming? Consider DB2! 2017 04 06


Thumbnail

The Fillmore Group is inviting organizations frustrated with rising Oracle costs to join us for a webinar focusing on the two primary motivations behind DB2 ...

Frank Fillmore

Oracle Renewal Looming? Consider DB2! @IBMAnalytics

Thanks to Kim May for a fabulous job stepping customers and IBMers through the rewards and challenges of converting Oracle databases to DB2.  The recording of the webinar is found here.  The...

(Read more)
Big Data University

How to Become a Data Scientist

“You can have data without information, but you cannot have information without data” – Daniel Keys Moran

 

The question of how to become a data scientist comes up often and many seem to have the answer. However, before you take their advice make sure that you’re speaking the same language, because there’s Paris, France and there’s Paris, Texas. Both lovely, of course, but what a surprise if you meant one and got directions to the other. Likewise, there are multiple interpretations of what a data scientist is and does.

So, let’s begin with some definitions and context on how we arrived here.

What exactly is data science?

Data science, also known as data-driven science, is defined as an interdisciplinary field about scientific methods, processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured.

Data Scientist is a relatively new job title, coined in 2008 by the likes of LinkedIn and Facebook. A quick Google trends search will show that after brief fits and starts the term began a sustained increase in popularity, as compared to the established job title of data engineer, after August 2012.

A data engineer historically has referred to someone trained as a software engineer and working on database systems or scaling production machines. As technology changes, new analytic techniques are required. Thus, a new and distinct title, data scientist.

Enter Drew Conway who attempted to differentiate/define the skills needed by a data scientist with his now famous Venn diagram. Although developed in 2010, it was not widely distributed until 2013.

This diagram began as a discussion of disciplines needed in a university-level data science curriculum. While there have been many revisions of this diagram and debates on topics omitted or emphasized, the three circles remain a relevant lay of the land.

 

The skillset of the data scientist is a comprehensive one. It requires hacking skills that facilitate data analysis and visualization, statistics, mathematics, and the knowledge of business operations.

Briefly, let’s examine each of the circles.

Substantive Expertise

Substantive expertise or more directly, domain knowledge, is the underpinning of what makes data science an exciting career choice. The ability to work with subject matter experts to understand the business strategy and process in order to convert a business problem into an analytics solution, is what separates a data scientist from, for example, a business analyst who is simply analyzing results of A/B testing.

Executive communication is a key business skill of the data scientist, as well. Domain knowledge added to the proficient delivery of actionable quantitative insights to a non-technical audience – leveraging effective visualization techniques –  elevates the analytics solution provided to the client.

Math & Statistics

Statistics and probability are a must for a data scientist, which should surprise no one. Different classes of business problems require different statistical techniques. Furthermore, the rigor offered by formal mathematical analysis ensures that the results are statistically sound.

Hacking Skills

Josh Wills seems to have the perfect definition with his quote:

 

Also, a software engineer is often writing production code whereas typically only the data scientist sees their own code. Because of this, the rise of open source software like R and Python, allowing the sharing of algorithms has been a huge benefit to data scientists.

Python or R?

On that note, a widely-debated question is whether you should learn R or Python. This question is difficult because it sets up a false choice and pits one language against another unnecessarily.

A programming language is a tool. As a data scientist, you should choose the tool that’s best for the problem at hand.

Python is known for its readability and is effective when your solution requires integration with web applications or if a production database is involved. R is great for exploratory analysis and efficient implementation of statistical models and tests.

R also has a very active community providing support to both novices and experts. Not to mention many recognizable evangelists like Hadley Wickham and Hilary Parker among others.

Python’s community is huge but less focused on data science specifically, so it’s a bit less organized. Nevertheless, the number of Python packages that are relevant to data science is growing steadily.

The bottom line is, for beginners, it’s advantageous to learn both languages to round out your tool kit. You’ll then be able to specialize in one, while still being able to read and work with the other if the need arises.

Why the emphasis on big data?

Big data and data science are often linked. But why the emphasis on big data? As recently as 2011, industry experts were sounding the alarm to gear up the necessary analytics talent which would be required to handle the coming deluge of huge volumes of data being generated every day.

In the report “Big data: The next frontier for innovation, competition, and productivity”, McKinsey predicted: “There will be a shortage of talent necessary for organizations to take advantage of big data. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.”

Consider further why this is so. The number of people with a “smartphone subscription” reached 2.6 billion in 2015, according to Ericsson Mobility Report. This figure is expected to grow to 6.1 billion by 2020. Smartphones allow convenient consumption of various multimedia. According to Pew Research Center, 77% of all US adults now own a smartphone versus 35% in 2011.

Moore’s Law is the engine behind past growth that both decreased the physical size of cell phones and increased computing power. To put this in context, in order to reach the moon, the Apollo’s Guidance Computer had only 2k of memory, a clock speed of 1.024 MHz, and 32k of storage. Today’s typical cell phone has a dual-core, 64-bit processor with maximum speeds of around 1.3GHz, paired with 1GB of RAM and a minimum of 16GB of storage.

Every day social media produces even more data. See for yourself how much data is produced every minute of the day in this infographic by DOMO.com.

The talent to handle, analyze, and derive insights from such large volumes of data has never been more critical.

Resources for data science education

The October 2012 edition of the Harvard Business Review featured an article that declared Data Scientist: The Sexiest Job of the 21st Century.

The article also noted that demand already exceeded supply as no universities were currently offering degrees in data science. However, soon that began to change and the landscape for data science education exploded.

So, how do you become a data scientist? The resources available to educate the next generation of data scientists depend mainly on the amount of time and money available to pursue your goal. Use these resources to close the gap between your current skills and the fundamental skills of a data scientist.

MOOCs

No article about how to become a Data Scientist would be complete without a discussion of MOOCs. A MOOC, or massive open online course, is typically free access to an online course aimed at unlimited participation via the internet. Coursera, possibly the largest and most well-known MOOC, was founded in 2012 in the wake of Stanford’s Andrew Ng successful online Machine Learning course. Other MOOCs soon followed.

By far the greatest benefit of this approach is that most MOOCs are free and learning is on your own time, at your own pace.

Some drawbacks to MOOCs include the lack of community that naturally occurs in a physical classroom setting and the fact that self-study requires quite a bit of discipline and motivation.

For an introductory overview of Data Science, try BigDataU’s Data Science Fundamentals Learning Path followed by Big Data Fundamentals Learning Path. Continue to explore other big data aspects in learning paths and courses, per your interests and desired specialization. All courses are free of charge.

Consider also, the following, which charge fees for a verified certification of participation: Coursera’s 10 course Data Science Specialization, edX’s 4 course Data Science specialization, and Udacity Data Science nanodegrees.

Bootcamps

Bootcamps came on the scene initially focused on front and back-end web designers/coders. The format soon expanded to include data science education. The boot camp is an immersive 12-26 week full-time experience where the promise is to move participants from novice to work-ready. Bootcamps can be pricey given the initial cost added to the 12-26 week period of unemployment.

There are many to choose from and you should investigate the bootcamps history of placement and alumni satisfaction. Here are a few of the more well known:

University Programs

If you asked people how to become a Data Scientist, their go-to answer would probably be to enroll in some relevant program at your local University.

The first university data science programs were simply the retooling of existing data analytics programs to include exercises working with big data. Today, it seems most major universities have centers dedicated specifically to data science.

When choosing a program, look for a well-respected institution with a program combining the three pillars of business skills, math/statistics, and data analysis.

But, do you need an advanced degree? Nate Silver of FiveThirtyEight, Moneyball’s Paul DePodesta and Cloudera’s Jeff Hammerbacher only have bachelor’s degrees.  The necessary skills for data science can be obtained at the bachelors or masters level. Supplement your education with the excellent resources available, mostly for free, with some experience and becoming a data scientist is achievable without a PhD.

A partial list of Universities offering data science Masters degrees:

  • The NYU Center for Data Science
  • Data Science Institute at Columbia University
  • University of Washington Master of Science in Data Science (MSDS)
  • Stanford University Master of Science in Statistics: Data Science

Having discussed data science and some of the options to become a data scientist, what’s your next move?

How to become a Data Scientist? Start where you are!

We often speak of the evolution of a data-driven company.  The stages range from awareness and analysis to insight and strategically data-driven as depicted below:

 

Similarly, becoming a data scientist is a process. Start the journey by focusing on the fundamentals of data science:

 

Fundamentals of Data Science

Develop your data awareness by learning to manipulate data and building basic data science skills. Using the free resources available, learn to code using both Python and R, and move from there. Using the BDU catalog,  we recommend the following:

Overview of Data Science

Coding Skills

Statistics and Probability

Machine Learning Algorithms

Communicating Results

 

Another great resource for projects to practice your skills is Kaggle which provides real world data sets and a community of aspiring data scientists.

Next steps

Because data science is about the discovery of insights, a healthy dose of curiosity is also essential in becoming a data scientist. Incorporate the fundamentals of data science wherever you are as you evolve your career. Imagine data science as a journey, not a destination, which provides the opportunity for life-long learning.

Take the journey and enjoy the detours along the way.

 

The post How to Become a Data Scientist appeared first on BDU.

 

April 05, 2017


Henrik Loeser

Aero Expo, Drones and the IBM Cloud

The Aero Expo, the Global Show for General Aviation, is running in my hometown Friedrichshafen from today until the weekend. One of the expo and conference topics is drones of the future (AERODrones...

(Read more)
Modern SQL

listagg — Rows to Delimited Strings

The listagg function transforms values from a group of rows into a list of values that are delimited by a configurable separator. Listagg is typically used to denormalize rows into a string of comma-separated values (CSV) or other comparable formats suitable for human reading.

Listagg does not apply any escaping: it is not generally possible to tell whether an occurrence of the separator in the result is an actual separator, or just part of a value. The safe use of listagg for electronic data interfaces is therefore limited to cases in which an unambiguous separator can be selected, e.g. when aggregating numbers, dates, or strings that are known to not contain the separator.

When implementing electronic data interfaces, arrays and document types (JSON, XML) are advantageous as they offer type safety, or at least proper escaping.

Syntax

Listagg is an ordered set function, which require the within group clause to specify an order. The minimal syntax is:

LISTAGG(<expression>, <separator>) WITHIN GROUP(ORDER BY …)

The <expression> must not contain window functions, aggregate functions or subqueries.0 The standard only allows character literals in <separator>—i.e. no expression and no bind parameter.1 Bind parameters are nevertheless well supported in practice.

Listagg removes null values before aggregation2 like most other aggregate functions. If no not null value remains, the result of listagg is null. If needed, coalesce can be used to replace null values before aggregation.

The on overflow clause

The return type of listagg is either varchar or clob with an implementation defined length limit.3 In practice, it is a varchar type.4

Listagg accepts the optional on overflow clause to define the behavior if the result exceeds the length limit of the return type:

LISTAGG(<expression>, <separator> ON OVERFLOW …)

The default is on overflow error. In this case, the standard requires an exception with SQLSTATE 22001 to be raised5in practice, this requirement is not fulfilled.

The on overflow truncate clause prevents the overflow by only concatenating as many values as the result type can accommodate. Furthermore, the on overflow truncate clause allows one to specify how the result is terminated:

ON OVERFLOW TRUNCATE [<filler>] WITH[OUT] COUNT

The optional <filler> defaults to three periods (...) and will be added as last element if truncation happens.

If with count is specified and truncation happens, the number of omitted values is put in brackets and appended to the result.

The SQL standard does not require a warning to be issued on truncation.6 To know whether the result is complete or not, users can parse the result7 or compare the actual length of the result to the calculated length for a result containing all values.8

distinct

The listagg function accepts the optional set quantifiers all and distinct:

LISTAGG( [ALL|DISTINCT] <expression>, <separator> …) …

If neither is specified, all is default. If distinct is specified, duplicate values are removed before aggregation. Note that the elimination of duplicates is subject to the collation in effect.

Warning

The standard does not specify which of the duplicate elements is removed.

If the order by clause places one occurrence at the beginning and the other at the end, it is unspecified at which place the value appears in the result.9

Distinct can be implemented manually by removing duplicates before aggregation—e.g. in a subquery. This works for databases not supporting distinct in listagg, and also allows to keep a particular occurrence if duplicates exist.

The following example demonstrates this approach. The columns g and o represent the group by and order by keys respectively. The example uses min(o) to keep the first occurrence in case one value appears multiple times.

SELECT g
     , LISTAGG(value, ',') WITHIN GROUP (ORDER BY o) list
  FROM (SELECT g, min(o) o, value
          FROM dist_listagg
         GROUP BY g, value
       ) dt
 GROUP BY g

Combining listagg with filter and over

Listagg can be combined with the filter and over clauses:

LISTAGG(…) WITHIN GROUP(…) [FILTER(WHERE …)] [OVER(…)]

The effect of the filter clause is to remove rows before aggregation. Case can be used for the same effect.

The over clause must not contain an order by clause10 because the mandatory within group clause must contain an order by clause anyway. It is not possible to narrow the window frame: the set of aggregated rows is always the full partition.

Compatibility

Listagg was introduced with SQL:2016 as optional feature T625. Even though listagg is not yet widely supported, most databases offer similar functionality using a proprietary syntax.

Availability of LISTAGG

Standard-Conforming Alternatives

Availability of conforming LISTAGG alternatives

Arrays

If the query does not strictly require the return of a delimited string, arrays can be used to return an array of values. An array can be constructed using the array_agg aggregate function or via a subquery.

ARRAY_AGG(<expression> ORDER BY …)
ARRAY(<query>)

In the second form, <query> can contain distinct and fetch first to remove duplicates and limit the array length.

Neither of the two approaches performs an implicit cast: the array elements have the same type as <expression>. That means that the retrieving application can fetch the values in a type-safe manner and apply formatting if required.

The type-safe nature of arrays allows them to also carry null values in an unambiguous way. Array_agg does therefore not remove null values like other aggregate functions do (including list_agg).11

The filter clause can be used to remove null values before aggregation with array_agg.12 If the filter clause removes all rows, array_agg returns null—not an empty array.

The subquery syntax allows removing null values in the where clause of the <query> and returns an empty array if the subquery doesn't return any rows.

If the order of elements is irrelevant, multisets and collect can also be used to pass a type-safe list to an application.

Document Types

Similar to array_agg, the SQL standard defines aggregate functions that return JSON or XML fragments: i.e. json_arrayagg and xmlagg. The main benefit compared to listagg is that they apply the respective escape rules.

JSON_ARRAYAGG(<expression> ORDER BY … [NULL ON NULL])
XMLAGG(XMLELEMENT(NAME <element-name>, <expression>) ORDER BY …)

Warning

Some articles show how to use SQL string manipulation functions to transform such documents into a delimited string. These examples often neglect the fact that the serialized document might contain escape sequences that need to be unescaped (e.g., &lt; in XML or \" in JSON).

Using with recursive

Note

Although the listagg functionality can be implemented using with recursive, it is often the better choice to use arrays, documents or the proprietary alternatives to listagg as shown below.

The following special case can be implemented using only with recursive and intermediate SQL-92:

LISTAGG(DISTINCT <expr1>, <sep> …) WITHIN GROUP(ORDER BY <expr1>)

Note the distinct and that <expr1> has to be the exact same expression in both cases.

The following example uses g as group by key, val as <expr1> and ', ' as <sep>:

WITH RECURSIVE
list_agg(g, val, list)
AS (
    SELECT g, min(val), CAST(null AS VARCHAR)
      FROM listagg_demo
     GROUP BY g
 UNION ALL
    SELECT prev.g
         , (SELECT min(val)
              FROM listagg_demo this
             WHERE this.g   = prev.g
               AND this.val > prev.val
           ) val
         , COALESCE(list || ', ', '') || val
      FROM list_agg prev
     WHERE prev.val IS NOT NULL
)
SELECT g, list
  FROM list_agg
 WHERE val IS NULL
 ORDER BY g

This particular implementation uses the “loose index scan” technique as explained on the PostgreSQL Wiki. The performance will remain at a rather low level even with an index on (g, val). The distinct behavior is a side effect of this technique.

The correct handling of null in val is an important special case: although null is generally ignored in aggregations, a group that consists of null values only must still be present in the result. This means that null must not be removed if there is no not null value in the group. The implementation above uses min(val) in the non-recursive expression to get this behavior.13

A more generic implementation that supports all semantics and arbitrary order by clauses is possible using with recursive and window functions. Aaron Bertrand's post “Grouped Concatenation in SQL Server” presents an example of this approach.

In both cases, arbitrary on overflow behavior can be implemented.14

Proprietary Extensions

The only useful extension that is commonly available is the support of bind parameters and constant expressions in <separator>.

The standard neither allows omitting the <separator> nor omitting the within group clause. Yet some databases treat them as optional and apply implementation defined defaults or expose undefined behavior if within group is omitted.

Proprietary Extensions of LISTAGG

Proprietary Alternatives

There are two widely available proprietary alternatives to listagg: group_concat and string_agg. Even though some databases use the same proprietary function name, they still use a different syntax.

The good news is that the proprietary functions have the same default semantic as listagg: they filter null values before aggregation but don't remove duplicates (all semantics).

string_agg — PostgreSQL Syntax

PostgreSQL's string_agg supports distinct and follows the array_agg syntax to specify an order:

STRING_AGG([ALL|DISTINCT] <expression>, <separator> [ORDER BY …])

PostgreSQL also offers a proprietary function to turn arrays into delimited strings: array_to_string.

string_agg — SQL Server Syntax (vNext/14.0)

SQL Server's string_agg doesn't support distinct and uses the within group clause to specify an order:

STRING_AGG(<expression>, <separator>) [WITHIN GROUP (ORDER BY …)]

Aaron Bertrand's post “Grouped Concatenation in SQL Server” demonstrates many other ways to implement listagg in older SQL Server releases—including the FOR XML PATH approach.

group_concat — MySQL Syntax

MySQL's group_concat supports distinct and expects the order by clause before the optional separator (default: comma):

GROUP_CONCAT([DISTINCT] <expression>
             ORDER BY …
             [SEPARATOR <separator>]
            )

MySQL's group_concat truncates the result if it exceeds a configurable maximum length and doesn't honor element boundaries when doing so. In other words, it might truncate in the middle of an element.

group_concat — SQLite Syntax

SQLite's group_concat neither supports distinct nor order by:

GROUP_CONCAT(<expression>, <separator>)

User-Defined Aggregates

Some databases support user-defined aggregate functions: MySQL (only in C/C++), Oracle Database (also WM_CONCAT), PostgreSQL, SQL Server (using CLR).

listagg — Rows to Delimited Strings” by Markus Winand was originally published at modern SQL.

 

April 04, 2017


Craig Mullins

The DB2 12 for z/OS Blog Series – Part 8: Index FTBs

IBM has delivered many new in-memory processing capabilities in DB2 12 for z/OS, so much so that Gartner has dubbed DB2 12 for z/OS an in-memory DBMS. This is good news for those of us in the mainframe world looking to utilize memory to improve the performance of our database applications. Perhaps the most interesting of the new in-memory features is the new Fast Traversal Blocks, or FTBs. An FTB...

(Read more)
Big Data University

This Week in Data Science (April 4, 2017)

Here’s this week’s news in Data Science and Big Data. dataconomy-blockchain

Don’t forget to subscribe if you find this useful!

Interesting Data Science Articles and News

Featured Courses From BDU

Cool Data Science Videos

The post This Week in Data Science (April 4, 2017) appeared first on BDU.


DB2utor

Adding a Partition Between Logical Partitions

Designing and sizing table space requires DBAs to  make numerous determinations. How many rows per day will be inserted/updated/deleted? How will the data be processed? What is the criteria to determine active versus historical data? How long will the data (active and historical) need to be stored before it will be deleted?

Subscribe by email

 

About

planetDB2 is an aggregator of blogs about the IBM DB2 database server. We combine and republish posts by bloggers around the world. Email us to have your blog included.
 

Bloggers

decor